Re: [Standards] Message-IDs

2018-02-28 Thread Kevin Smith
On 28 Feb 2018, at 14:47, Denver Gingerich  wrote:
> 
> On Wed, Feb 28, 2018 at 08:59:01AM +, Kevin Smith wrote:
>> On 13 Feb 2018, at 16:57, Simon Friedberger  wrote:
>>>E3. Simply make the ID: FROM-TIMESTAMP.
>>>Here FROM needs to be the eventual FROM after possible
>>> rewriting. Can
>>>that be done?
>>>And TIMESTAMP has to be strictly increasing so should have
>>> sub-second
>>>resolution.
>>>I assume this is impossible because otherwise it would be to
>>> easy. But
>>>why is it impossible? :)
>> 
>> Because timestamps aren’t monotonic? :)
> 
> Do you mean because most people use Unix time and/or other UTC-based 
> timestamps (that have leap seconds)?
> 
> If so, this can be mostly solved by using TAI timestamps.  Unfortunately, it 
> is tricky in most OSes to obtain a TAI timestamp, but I found some code that 
> does this (on many platforms anyway):
> 
> https://ossguy.com/tai.c
> 
> We've used this code for implementing usage tracking in JMP (to ensure a 
> day's length doesn't vary from day to day - it is always exactly 86,400 
> seconds long).  For details, see 
> https://gitlab.com/ossguy/sgx-catapult/commit/31c2cb7c8fbea1ad4cc6753a4343dbfc65552fa5
>  .  As you might suspect, we'd like to port the above TAI code to Ruby, but 
> it works ok as-is for now.
> 
> I realize that clock skew could still cause the TAI timestamp that your OS 
> returns to be non-monotonic (i.e. a machine issue, not an issue with TAI time 
> itself); I'm not sure if that's a substantial issue for the message IDs being 
> discussed here.

I meant because clock skew is a thing, so relying on the monotonicity doesn’t 
work. Seems like it shouldn’t be a thing, but is.

/K
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-28 Thread Denver Gingerich
On Wed, Feb 28, 2018 at 08:59:01AM +, Kevin Smith wrote:
> On 13 Feb 2018, at 16:57, Simon Friedberger  wrote:
> > E3. Simply make the ID: FROM-TIMESTAMP.
> > Here FROM needs to be the eventual FROM after possible
> > rewriting. Can
> > that be done?
> > And TIMESTAMP has to be strictly increasing so should have
> > sub-second
> > resolution.
> > I assume this is impossible because otherwise it would be to
> > easy. But
> > why is it impossible? :)
> 
> Because timestamps aren’t monotonic? :)

Do you mean because most people use Unix time and/or other UTC-based timestamps 
(that have leap seconds)?

If so, this can be mostly solved by using TAI timestamps.  Unfortunately, it is 
tricky in most OSes to obtain a TAI timestamp, but I found some code that does 
this (on many platforms anyway):

https://ossguy.com/tai.c

We've used this code for implementing usage tracking in JMP (to ensure a day's 
length doesn't vary from day to day - it is always exactly 86,400 seconds 
long).  For details, see 
https://gitlab.com/ossguy/sgx-catapult/commit/31c2cb7c8fbea1ad4cc6753a4343dbfc65552fa5
 .  As you might suspect, we'd like to port the above TAI code to Ruby, but it 
works ok as-is for now.

I realize that clock skew could still cause the TAI timestamp that your OS 
returns to be non-monotonic (i.e. a machine issue, not an issue with TAI time 
itself); I'm not sure if that's a substantial issue for the message IDs being 
discussed here.

Denver
https://jmp.chat/
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-28 Thread Kevin Smith
On 28 Feb 2018, at 09:35, Jonas Wielicki  wrote:
> 
> On Mittwoch, 28. Februar 2018 10:28:01 CET Kevin Smith wrote:
>> On 26 Feb 2018, at 15:59, Simon Friedberger  wrote:
>>> So, lest this discussion just die. Here is a proposal:
>> Thanks for the proposal. Bashing follows.
>> 
>>>   Client-A generates message-ID based on HASH(connection_counter,
>>>   server_salt). The connection_counter needs to be maintained only for
>>>   one connection. The server salt is server generated, anew for each
>>>   connection and is sent to.
>>> 
>>>   Server-A checks that this is correct and uses it for MAM. This
>>>   should make life easier for clients because they only need to deal
>>>   with one ID.
>> 
>> I think stopping servers being able to use their own IDs for DB storage is
>> probably disadvantageous. Although I see the appeal of a client knowing its
>> own MAM IDs, I’m not sure that simply knowing it is sufficient - you also
>> need to know where it fits into the order of the archive, if you’re going
>> to use it for archive sync, so I’m not sure this is actually buying
>> anything, at the cost of of lack of flexibility in server implementations.
> 
> Good point about the order. This essentially means that we need a reflection. 
> Self-carbons essentially. At which point we can simply let the server 
> generate 
> the ID(s).

I’m not sure that’s true, as you want to know your ID immediately upon sending 
- e.g. for following up with LMC you don’t want to wait for roundtrips before 
you can do that. So I think you want the client to be generating at least some 
ID used for something.

/K
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-28 Thread Jonas Wielicki
On Mittwoch, 28. Februar 2018 10:28:01 CET Kevin Smith wrote:
> On 26 Feb 2018, at 15:59, Simon Friedberger  wrote:
> > So, lest this discussion just die. Here is a proposal:
> Thanks for the proposal. Bashing follows.
> 
> >Client-A generates message-ID based on HASH(connection_counter,
> >server_salt). The connection_counter needs to be maintained only for
> >one connection. The server salt is server generated, anew for each
> >connection and is sent to.
> >
> >Server-A checks that this is correct and uses it for MAM. This
> >should make life easier for clients because they only need to deal
> >with one ID.
> 
> I think stopping servers being able to use their own IDs for DB storage is
> probably disadvantageous. Although I see the appeal of a client knowing its
> own MAM IDs, I’m not sure that simply knowing it is sufficient - you also
> need to know where it fits into the order of the archive, if you’re going
> to use it for archive sync, so I’m not sure this is actually buying
> anything, at the cost of of lack of flexibility in server implementations.

Good point about the order. This essentially means that we need a reflection. 
Self-carbons essentially. At which point we can simply let the server generate 
the ID(s).

kind regards,
Jonas

signature.asc
Description: This is a digitally signed message part.
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-28 Thread Kevin Smith
On 26 Feb 2018, at 15:59, Simon Friedberger  wrote:
> So, lest this discussion just die. Here is a proposal:

Thanks for the proposal. Bashing follows.

>Client-A generates message-ID based on HASH(connection_counter,
>server_salt). The connection_counter needs to be maintained only for
>one connection. The server salt is server generated, anew for each
>connection and is sent to.
> 
>Server-A checks that this is correct and uses it for MAM. This
>should make life easier for clients because they only need to deal
>with one ID.

I think stopping servers being able to use their own IDs for DB storage is 
probably disadvantageous.
Although I see the appeal of a client knowing its own MAM IDs, I’m not sure 
that simply knowing it is sufficient - you also need to know where it fits into 
the order of the archive, if you’re going to use it for archive sync, so I’m 
not sure this is actually buying anything, at the cost of of lack of 
flexibility in server implementations.

>  * Two problems need to be considered here:
>  o The client needs to maintain a counter.

The literal ‘have a counter in memory’ is trivial, although getting the rules 
for incrementing it right can be difficult - moreso than for SM IDs, which 
there’s another thread at the moment about people not being able to get right.

>  Even
>though I called it a counter, it does not need to be contiguous.
>It just needs to be increasing that the server can easily check
>that for a given salt value it is unique.

If it’s not contiguous, how is the server going to go about validating the hash 
of an unknown value?

>  o The server needs to check the validity of the counter. If the
>server is actually replicated and consists of multiple machines
>this is not strictly possible.

I’m not sure I understand this. If the server salt is local to a node, and the 
connection counter is local to a connection, which is local to a node, even in 
a split cluster this should be fine?

> However, assuming normal
>operations the IDs generated by the client will be fine and if
>the servers have any mechanism for eventual consistency a
>misbehaving client will be detected.

Will they? If the server can’t check the stanza at submission time, I don’t 
think it can ever (reasonably) check it later.

>Server-B gets the message via s2s. It changes the message-ID to a
>new one and stores the original as "origin-ID”.

That’s going to break errors and all sorts isn’t it? A stanza’s ID needs to be 
stable or things will break.

>Client-B gets a message with only TWO IDs. message-ID is for
>referencing locally for MAM, origin-ID is for referencing when
>talking to the sender i.e. read receipts.

What happens with MUC? That’s an extra entity that may be doing MAM, and will 
generate new stanza IDs for the fan-out.

>If a server generates follow-up messages it makes up a new
>sender-ID. It should maybe set a “triggered-by-ID” so the client can
>determine that it triggered this message. Maybe this is unnecessary.
>The server definitely must send the message it inserted back to the
>client to ensure a common view of history.

What does ‘generates follow-up messages’ here mean?

>If a server changes a message it can keep the sender-ID but it MUST
>notify the client who sent the message to make sure that clients
>have the same view of the history.

What does ‘changes a message’ here mean? There are situations where a message 
is modified in flight and the sender can’t be told what it’s modified to.

> In this proposal stanza-IDs are not required. The message-ID is
> authoritative and when rewriting the original message-ID is kept as
> origin-ID.

I’m not sure they’re not required (see comments on MAM).

> From my original mail this solves C1, C2, C3, C4 and C5.

I’m not sure it helps with C1. It only helps with C2 by going through and 
changing every XEP that uses a stanza ID and change it to use an origin-ID, I 
think? I don’t think it makes a difference to C3 at all, does it? It doesn’t 
help C4, as the client still needs a bounce to get ordering right, and I don’t 
see how it handles C5.

> Also note, to make this a simpler change the clients could set both
> origin-ID and message-ID. The stanza-ID for MAM would turn out to be the
> same. This would be very similar to what is probably currently the most
> widespread behavior. Except that the origin-ID should be used for
> read-receipts, etc.

I suspect that just saying in message receipts and in LMC etc. “use the 
origin-id when present” would achieve much the same thing as this proposal?

/K
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-28 Thread Kevin Smith
On 13 Feb 2018, at 16:57, Simon Friedberger  wrote:
> During the discussion on the different ID types at the summit I had an
> idea for
> a possible solution to the problem but not a sufficient understanding of the
> problem to even discuss it. I tried to find somebody to discuss it with
> in chat
> afterwards but nobody was available and I forgot about it. To get it off
> my ToDo
> list, here is my current understanding. I hope it can be a basis for further
> discussion.
> 
> 
> A) Status-Quo:
> Currently there are
> A1. stanza-ID: generated by server
> A2. origin-ID: generated by client
> from https://xmpp.org/extensions/xep-0359.html and
> 
> A3. message-ID: this is the ID-attribute on the stanza
> from https://tools.ietf.org/html/rfc6120#section-8.1.3
> 
> There are also (4.) SM-IDs in stream management but those are
> per-stream and
> unrelated.
> 
> 
> B) Use-cases:
> B1. MAM https://xmpp.org/extensions/xep-0313.html uses stanza-ID.
> B2. MUCs require IDs to detect reflections of own messages.
> And reflection is great because it gives everybody the same view
> on the
> MUC in the presence of things like autopastebin or other rewrites.
> B3. Error responses have the same ID-attribute as the original stanza.
> 
> 
> C) Problems with current situation:
> C1. People dislike having so many different IDs.
> This is not a problem per se but it does mean implementation
> complexity
> and confusion.

I think confusion I buy - we need to be careful to define things properly.

> C2. According to Daniel it is not clear which ID should be used when
> referencing things. In other words if he gets a delivery receipt
> for an
> ID the client might have based that on the origin-ID or the
> message-ID.
> I'm not sure if this should be considered relevant. People can
> always
> write broken clients which send back crap. Of course if it happens
> unintentionally because of (C1.) fewer IDs would help

I don’t think this is particularly unclear, (it’s the id of the stanza - all 
the other ids are newer inventions with specific contexts), but easy to clarify.

> C3. Using origin-ID to detect MUC reflection doesn't always work
> because MUCs
> may not reflect it.
> That's of course unfortunate but should IMHO considered an error
> in the
> MUC implementation (probably a transport) and fixed there.

Mabe. I note that MUCs stripping out non-body payloads is actually a 
feature in some servers.

> C4. Clients require a bounce of their messages to learn the
> stanza-id which
> is used for MAM.
> Why do they need to know? Maybe they want to reference their own
> message.

They need to know where their stanza sits in the ordering of the archive (and 
its id) if they want to be able to do sync later.

> Do they require this bounce anyway to make sure that their was
> on rewriting?

Possibly.

> C5. Some MUCs rewrite the message-id
> Why is this allowed? It is even suggested here:
> https://xmpp.org/extensions/xep-0045.html#message

Mostly it’s allowed because the spec didn’t say not to do it, and it got moved 
to Draft, and it was implemented, and so the rules of “don’t make breaking 
changes unless unavoidable” applied and it couldn’t be sensibly changed.

> C6. A global ID to reference messages might be nice.
> C7. When referencing a message for example by "liking" it a forgeable ID
> could get you to like things you didn't intend to like.
> This is a difficult problem because in many cases it requires
> malicious
> clients and servers and those have a lot of power anyway.

Not that much power, relatively. They’re not usually able to rewrite history in 
a meaningful way, but with this they become able to (look like they) do that.

> D) Possible root cause:
> People do not trust the message IDs assigned by others and therefore
> want to
> assign their own.

I’m not sure what this is saying - the root cause of *what*?

> E) Suggested solutions, including partial solutions:
> E1. message-ID and origin-ID should always be the same, as proposed
> by Georg
> in
> https://mail.jabber.org/pipermail/standards/2017-September/033415.html
> Some concerns where voiced in that thread the only valid one is
> that due
> to bad software we need to deal with the situation that they are
> different anyway.
> There was a privacy concern about the "by=" attribute but
> origin-ID does
> not actually have that.
> According to Daniel and Georg things currently break down anyway
> if this
> does not hold.



> E2. Make the ID verifiable: This is what I had in mind at the summit and
> after some discussion yesterday Jonas and Dave basically immediately
> came up with the same thing, so it might be re

Re: [Standards] Message-IDs

2018-02-27 Thread Jonas Wielicki
On Montag, 26. Februar 2018 16:59:46 CET Simon Friedberger wrote:
> So, lest this discussion just die. Here is a proposal:
> 
>   *
> 
> Client-A generates message-ID based on HASH(connection_counter,
> server_salt). The connection_counter needs to be maintained only for
> one connection. The server salt is server generated, anew for each
> connection and is sent to.
> 
>   *
> 
> Server-A checks that this is correct and uses it for MAM. This
> should make life easier for clients because they only need to deal
> with one ID.
> 
>   * Two problems need to be considered here:
>   o The client needs to maintain a counter. I don't know if there
> are cases where the client cannot persist this counter but keeps
> a connection. In this case a sufficiently fine grained timestamp
> to make it strictly monotonically increasing is suffcient. Even
> though I called it a counter, it does not need to be contiguous.
> It just needs to be increasing that the server can easily check
> that for a given salt value it is unique.
>   o The server needs to check the validity of the counter. If the
> server is actually replicated and consists of multiple machines
> this is not strictly possible. However, assuming normal
> operations the IDs generated by the client will be fine and if
> the servers have any mechanism for eventual consistency a
> misbehaving client will be detected. I think this fits the XMPP
> model of "robust cooperation".
>   *
> 
> Server-B gets the message via s2s. It changes the message-ID to a
> new one and stores the original as "origin-ID".
> 
>   *
> 
> Client-B gets a message with only TWO IDs. message-ID is for
> referencing locally for MAM, origin-ID is for referencing when
> talking to the sender i.e. read receipts.
> 
>   *
> 
> If a server generates follow-up messages it makes up a new
> sender-ID. It should maybe set a “triggered-by-ID” so the client can
> determine that it triggered this message. Maybe this is unnecessary.
> The server definitely must send the message it inserted back to the
> client to ensure a common view of history.
> 
>   *
> 
> If a server changes a message it can keep the sender-ID but it MUST
> notify the client who sent the message to make sure that clients
> have the same view of the history.
> 
> In this proposal stanza-IDs are not required. The message-ID is
> authoritative and when rewriting the original message-ID is kept as
> origin-ID.
> 
> From my original mail this solves C1, C2, C3, C4 and C5. Mostly just by
> defining them. This does not give us a global message-ID (C6) or
> unforgeable message-IDs (C7).
> 
> 
> Note, that I would prefer to have a globally unique ID. This is possible
> under the assumption that everybody tries to generate unique IDs and
> that non-unique IDs and misbehaving parties can be removed from the
> system. Essentially, it would look just like this except that the
> message-ID would have to include an ID for the originating server. That
> would allow recipients to check that connection_counter is increasing
> and the server_salt is unique for this server. The latter check might be
> hard to perform, though. It can still be solved using timestamps. This
> proposal seems much simpler, and it solves most of the problems.
> 
> 
> Also note, to make this a simpler change the clients could set both
> origin-ID and message-ID. The stanza-ID for MAM would turn out to be the
> same. This would be very similar to what is probably currently the most
> widespread behavior. Except that the origin-ID should be used for
> read-receipts, etc.
> 
> 
> Opinions?

I find the overall concept very appealing. Thank you for taking the time to 
work this out.

I think you overestimate some complexities there (which is good) regarding to 
clustering etc. If a server uses a 128bit random number for the server salt 
and we enforce the counter to be continuous and monotonic, I don’t see any 
interaction between cluster nodes needed.

Likewise for the state keeping on the client side: If a client can keep a 
connection, it should be able to keep an 8 byte counter state along with it.

What needs to be specified is counter overflow. Could be done with a simple 
request from the client for a new salt.

I don’t see a good way to integrate the date in the message ID though (cc @ 
Zash). Even if we let the server define a must have prefix which they could 
incidentally set to the date, a way to handle date changes during a connection 
would be needed.

kind regards,
Jonas

signature.asc
Description: This is a digitally signed message part.
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-26 Thread Simon Friedberger
So, lest this discussion just die. Here is a proposal:

  *

Client-A generates message-ID based on HASH(connection_counter,
server_salt). The connection_counter needs to be maintained only for
one connection. The server salt is server generated, anew for each
connection and is sent to.

  *

Server-A checks that this is correct and uses it for MAM. This
should make life easier for clients because they only need to deal
with one ID.

  * Two problems need to be considered here:
  o The client needs to maintain a counter. I don't know if there
are cases where the client cannot persist this counter but keeps
a connection. In this case a sufficiently fine grained timestamp
to make it strictly monotonically increasing is suffcient. Even
though I called it a counter, it does not need to be contiguous.
It just needs to be increasing that the server can easily check
that for a given salt value it is unique.
  o The server needs to check the validity of the counter. If the
server is actually replicated and consists of multiple machines
this is not strictly possible. However, assuming normal
operations the IDs generated by the client will be fine and if
the servers have any mechanism for eventual consistency a
misbehaving client will be detected. I think this fits the XMPP
model of "robust cooperation".
  *

Server-B gets the message via s2s. It changes the message-ID to a
new one and stores the original as "origin-ID".

  *

Client-B gets a message with only TWO IDs. message-ID is for
referencing locally for MAM, origin-ID is for referencing when
talking to the sender i.e. read receipts.

  *

If a server generates follow-up messages it makes up a new
sender-ID. It should maybe set a “triggered-by-ID” so the client can
determine that it triggered this message. Maybe this is unnecessary.
The server definitely must send the message it inserted back to the
client to ensure a common view of history.

  *

If a server changes a message it can keep the sender-ID but it MUST
notify the client who sent the message to make sure that clients
have the same view of the history.

In this proposal stanza-IDs are not required. The message-ID is
authoritative and when rewriting the original message-ID is kept as
origin-ID.

From my original mail this solves C1, C2, C3, C4 and C5. Mostly just by
defining them. This does not give us a global message-ID (C6) or
unforgeable message-IDs (C7).


Note, that I would prefer to have a globally unique ID. This is possible
under the assumption that everybody tries to generate unique IDs and
that non-unique IDs and misbehaving parties can be removed from the
system. Essentially, it would look just like this except that the
message-ID would have to include an ID for the originating server. That
would allow recipients to check that connection_counter is increasing
and the server_salt is unique for this server. The latter check might be
hard to perform, though. It can still be solved using timestamps. This
proposal seems much simpler, and it solves most of the problems.


Also note, to make this a simpler change the clients could set both
origin-ID and message-ID. The stanza-ID for MAM would turn out to be the
same. This would be very similar to what is probably currently the most
widespread behavior. Except that the origin-ID should be used for
read-receipts, etc.


Opinions?

___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-17 Thread Jonas Wielicki
On Dienstag, 13. Februar 2018 21:42:56 CET Simon Friedberger wrote:
> >> ...
> > You are mixing multiple problems with multiple solutions, which was
> > probably in an effort to get the whole picture, but also leads to
> > confusion. I personally would like to concentrate on solving C4, where
> > you pointed out a promising candidate for a solution: E2
> 
> Indeed. Mostly because I still don't think that I understand the
> complete picture.
> For example, if we are only trying to solve C4, is that really worth the
> effort?
> Does it do anything more than save a round-trip?

Yes. The "round-trip" you’re speaking of may be excessively expensive. 
Essentially, if a client wants to know the stanza-id of a message it sent, it 
needs to do a MAM query starting with the last known stanza-id and do some 
matching. There is no other way (because you don’t get carbons for messages 
you sent yourself). No client is doing this afaik. Clients which do not do 
this have to resort to some kind of heuristic when syncing MAM at a later 
point.

So we’re solving a "round trip or annoying heuristic" situation. This is worse 
than it sounds, because it makes clients much more complex (or I am missing 
something; that would be great.): If a client wants to refer to messages 
internally by some unique ID, it would be natural to use the stanza-id, 
because that ID can be used with MAM queries, too. However, that’s not 
possible if you don’t know the stanza-id for outbound messages. So instead, 
clients need to add a layer of indirection with yet-another client-internal ID 
for the message (probably most of the time some type of auto-increment 
integer).


kind regards,
Jonas

signature.asc
Description: This is a digitally signed message part.
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-14 Thread Simon Friedberger
Hi Michal,

thank you for your comments. I will address them inline.

> I'm really tempted to say that the new message routing (in next gen
> XMPP as discussed during summit)
> must require the message stanza to have "id" attribute. I personally
> think that uuid v4 would enough here.
> This, to my knowledge, is hard to guess so a malicious user is
> probably not able to guess next ID.
> What it can do, though is to "reuse" the same id in other message,
> which maybe a bad thing.
So from the discussion we had in the summit-MUC it seems like abusing a
guessed
ID is not possible anyway if senders are properly verified. If anybody
thinks otherwise,
please speak up!
Indeed, reusing IDs for different messages is always possible but can
be mitigated by requiring the ID to be a function of the message.

>     E2. ...
>
>
> Making the id verifiable (in the most efficient way) would be perfect.
> I think, here we need to remember that no every client will have SM
> enabled, so it may not have the sm-counter.
Good point, thanks for bringing it up. This can probably be solved using
something like the salt based variant of E2.
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-14 Thread Michal Piotrowski
Hi Simon

Thanks for refreshing the topic.
Few things from me below (perspective of XMPP server developer, MongooseIM).


Best regards
Michal Piotrowski
michal.piotrow...@erlang-solutions.com

On 13 February 2018 at 17:57, Simon Friedberger 
wrote:

> Hello List!
>
>
> During the discussion on the different ID types at the summit I had an
> idea for
> a possible solution to the problem but not a sufficient understanding of
> the
> problem to even discuss it. I tried to find somebody to discuss it with
> in chat
> afterwards but nobody was available and I forgot about it. To get it off
> my ToDo
> list, here is my current understanding. I hope it can be a basis for
> further
> discussion.
>
>
> A) Status-Quo:
> Currently there are
> A1. stanza-ID: generated by server
> A2. origin-ID: generated by client
> from https://xmpp.org/extensions/xep-0359.html and
>
> A3. message-ID: this is the ID-attribute on the stanza
> from https://tools.ietf.org/html/rfc6120#section-8.1.3
>
> There are also (4.) SM-IDs in stream management but those are
> per-stream and
> unrelated.
>
>
> B) Use-cases:
> B1. MAM https://xmpp.org/extensions/xep-0313.html uses stanza-ID.
> B2. MUCs require IDs to detect reflections of own messages.
> And reflection is great because it gives everybody the same view
> on the
> MUC in the presence of things like autopastebin or other rewrites.
> B3. Error responses have the same ID-attribute as the original stanza.
>
>
> C) Problems with current situation:
> C1. People dislike having so many different IDs.
> This is not a problem per se but it does mean implementation
> complexity
> and confusion.
>

I'm really tempted to say that the new message routing (in next gen XMPP as
discussed during summit)
must require the message stanza to have "id" attribute. I personally think
that uuid v4 would enough here.
This, to my knowledge, is hard to guess so a malicious user is probably not
able to guess next ID.
What it can do, though is to "reuse" the same id in other message, which
maybe a bad thing.

C2. According to Daniel it is not clear which ID should be used when
> referencing things. In other words if he gets a delivery receipt
> for an
> ID the client might have based that on the origin-ID or the
> message-ID.
> I'm not sure if this should be considered relevant. People can
> always
> write broken clients which send back crap. Of course if it happens
> unintentionally because of (C1.) fewer IDs would help
> C3. Using origin-ID to detect MUC reflection doesn't always work
> because MUCs
> may not reflect it.
> That's of course unfortunate but should IMHO considered an error
> in the
> MUC implementation (probably a transport) and fixed there. I
> understand
> that it might be difficult in some cases
> ( https://lab.louiz.org/louiz/biboumi/issues/3283 ) but as Daniel
> already pointed out yesterday it is much easier to fix a transport,
> since it knows which protocol it is talking, to instead of working
> around it at the end.
> In any case the current situation seems to be bad:
>
> https://wiki.xmpp.org/web/XEP-Remarks/XEP-0045:_Multi-User_
> Chat#Matching_Your_Reflected_Message
> C4. Clients require a bounce of their messages to learn the
> stanza-id which
> is used for MAM.
> Why do they need to know? Maybe they want to reference their own
> message.
>

They may need that, for instance, to know where from they can start syncing
the archive after being offline.


> Do they require this bounce anyway to make sure that their was
> on rewriting?
> C5. Some MUCs rewrite the message-id
> Why is this allowed? It is even suggested here:
> https://xmpp.org/extensions/xep-0045.html#message
> C6. A global ID to reference messages might be nice.
> C7. When referencing a message for example by "liking" it a forgeable
> ID
> could get you to like things you didn't intend to like.
> This is a difficult problem because in many cases it requires
> malicious
> clients and servers and those have a lot of power anyway.
>
>
> D) Possible root cause:
> People do not trust the message IDs assigned by others and therefore
> want to
> assign their own.
>
>
> E) Suggested solutions, including partial solutions:
> E1. message-ID and origin-ID should always be the same, as proposed
> by Georg
> in
> https://mail.jabber.org/pipermail/standards/2017-September/033415.html
> Some concerns where voiced in that thread the only valid one is
> that due
> to bad software we need to deal with the situation that they are
> different anyway.
> There was a privacy concern about the "by=" attribute but
> origin-ID does
> not actually have that.
> According to Daniel and 

Re: [Standards] Message-IDs

2018-02-13 Thread Florian Schmaus
On 13.02.2018 21:42, Simon Friedberger wrote:
> On 13.02.2018 17:57, Simon Friedberger wrote:
>>>     C2. According to Daniel it is not clear which ID should be used when
>>>     referencing things. In other words if he gets a delivery receipt
>>> for an
>>>     ID the client might have based that on the origin-ID or the
>>> message-ID.
>> Delivery receipts predate xep359 so it is safe to say that the intention
>> is that delivery receipts use rfc6120-ids. While it is IMHO obvious from
>> reading xep184 that it is based on rfc6120-ids, it can't hurt to specify
>> this more explicitly.
> But looking at https://xmpp.org/extensions/xep-0045.html#message
> the message-ID seems to be rewritten to different values for different
> recipients.
> How can a client who gets a delivery receipt with such an ID figure out
> which
> message it is for?

You can not reliable figure it out with the current specifications. One
possibly option is to extend xep184 receipts to (optionally) include
xep359 IDs. Maybe that would even be a backwards compatible change, e.g.
clients could check for the xep359 ID in the receipt and fall back to
the rfc6120 ID.

- Florian



signature.asc
Description: OpenPGP digital signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-13 Thread Simon Friedberger
Hi Florian,
thanks for chiming in!

On 13.02.2018 17:57, Simon Friedberger wrote:
>>     C2. According to Daniel it is not clear which ID should be used when
>>     referencing things. In other words if he gets a delivery receipt
>> for an
>>     ID the client might have based that on the origin-ID or the
>> message-ID.
> Delivery receipts predate xep359 so it is safe to say that the intention
> is that delivery receipts use rfc6120-ids. While it is IMHO obvious from
> reading xep184 that it is based on rfc6120-ids, it can't hurt to specify
> this more explicitly.
But looking at https://xmpp.org/extensions/xep-0045.html#message
the message-ID seems to be rewritten to different values for different
recipients.
How can a client who gets a delivery receipt with such an ID figure out
which
message it is for?

>> E) Suggested solutions, including partial solutions:
>>     E1. message-ID and origin-ID should always be the same,
>>     According to Daniel and Georg things currently break down anyway
>> if this does not hold.
> I don't now why things should break down if this does not hold.
I think because it is difficult to match IDs to messages due to the reasons
mentioned above.

>>     C3. Using origin-ID to detect MUC reflection doesn't always work
>> because MUCs
>>     may not reflect it.
> A short note: If a MUC service announces support for 'urn:xmpp:sid:0'
> then the service is required to reflect the xep359 IDs. So clients are
> at least able to determine if the MUC will reflect the xep359 extension
> elements (but not if the MUC won't).
And client developers should probably refuse to join MUCs that don't.
Mandating it in the standard might still be good motivation for transport
implementers.

>>     C5. Some MUCs rewrite the message-id
>>     Why is this allowed? It is even suggested here:
>>     https://xmpp.org/extensions/xep-0045.html#message
> Hehe, that's an old discussion. Some people argue that the reflected
> message is not the initial message and thus, could get a new ID. I also
> think that the MUC way wants to enforce unique IDs for reflected
> messages, which may not be guaranteed if the MUC service would need to
> use the client provided ID.
>
> No matter what, I doubt that this will change in the future. Although I
> have currently a neutral stance, XEP-0045 is to some degree set in
> stone, it it unlikely to get such a fundamental change.
This is an interesting point. I overlooked that it is exacerbated by
the fact that some MUCs split messages so an ID for some messages is
simply not available.
Hm...
What is the correct behavior here?
Clearly, having messages with the same ID does not work for referencing,
corrections, whatever..
On the other hand if a new ID is generated the client needs to be told
that the server just made it say something and it can now expect delivery
receipts for that.

When hashing the message this is forced. It will change the ID
and the client has to know.

I don't see how this can be solved without a "bounce" since the bounce isn't
one because the server generated the message.


>> ...
> Sounds like an interesting approach which we should explore.
But apparently it doesn't work. xD


>> ...
> You are mixing multiple problems with multiple solutions, which was
> probably in an effort to get the whole picture, but also leads to
> confusion. I personally would like to concentrate on solving C4, where
> you pointed out a promising candidate for a solution: E2
Indeed. Mostly because I still don't think that I understand the
complete picture.
For example, if we are only trying to solve C4, is that really worth the
effort?
Does it do anything more than save a round-trip?
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Message-IDs

2018-02-13 Thread Simon Friedberger
(Hm..sorry for the screwed up line lengths.)

Note that my suggestion:

On 13.02.2018 17:57, Simon Friedberger wrote:
>     E2. Make the ID verifiable: This is what I had in mind at the summit and
>     after some discussion yesterday Jonas and Dave basically immediately
>     came up with the same thing, so it might be reasonably 
> straightforward.
>     Basically, the client calculates the ID based on some information that
>     it shares with the server like HASH(stream-id || sm-counter). This 
> would
>     allow the server to verify that the client generated a proper ID. 
> Jonas
>     suggested HMAC(key=stream-id, msg=sm-counter). If the message is in a
>     MUC, the MUC server can provide the user with some salt and then a
>     HASH(message-counter || salt) could be used to ensure that proper 
> unique
>     IDs are generated.
>     This ID is based on there being a party which is in charge of checking
>     the IDs. If you connect to a malicious MUC with malicious clients they
>     can still send you whatever. I don't think that is a problem, is it?

Does not solve the problem, that a malicious server can send out
messages with duplicate IDs. The servers or clients receiving
them have no way to check. This could be fixed by including the
message body (and whatever else seems appropriate in the hash).
Which would leave us something like
HASH(message-body || HASH(stream-id || sm-counter))
and HASH(stream-id || sm-counter) would have to be transmitted to
remote servers. This does prevent (C7.)
>     C7. When referencing a message for example by "liking" it a forgeable ID
>     could get you to like things you didn't intend to like.
>     This is a difficult problem because in many cases it requires 
> malicious
>     clients and servers and those have a lot of power anyway.

But I'm not sure it is necessary given that messages are not
authenticated anyway. They aren't even for OMEMO.
They could theoretically be but the attack still seems a bit
academic. Anyway, hashes are generally cheap and it might not
hurt to include the entire message in the hash.
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


[Standards] Message-IDs

2018-02-13 Thread Simon Friedberger
Hello List!


During the discussion on the different ID types at the summit I had an
idea for
a possible solution to the problem but not a sufficient understanding of the
problem to even discuss it. I tried to find somebody to discuss it with
in chat
afterwards but nobody was available and I forgot about it. To get it off
my ToDo
list, here is my current understanding. I hope it can be a basis for further
discussion.


A) Status-Quo:
    Currently there are
    A1. stanza-ID: generated by server
    A2. origin-ID: generated by client
    from https://xmpp.org/extensions/xep-0359.html and

    A3. message-ID: this is the ID-attribute on the stanza
    from https://tools.ietf.org/html/rfc6120#section-8.1.3

    There are also (4.) SM-IDs in stream management but those are
per-stream and
    unrelated.


B) Use-cases:
    B1. MAM https://xmpp.org/extensions/xep-0313.html uses stanza-ID.
    B2. MUCs require IDs to detect reflections of own messages.
    And reflection is great because it gives everybody the same view
on the
    MUC in the presence of things like autopastebin or other rewrites.
    B3. Error responses have the same ID-attribute as the original stanza.


C) Problems with current situation:
    C1. People dislike having so many different IDs.
    This is not a problem per se but it does mean implementation
complexity
    and confusion.
    C2. According to Daniel it is not clear which ID should be used when
    referencing things. In other words if he gets a delivery receipt
for an
    ID the client might have based that on the origin-ID or the
message-ID.
    I'm not sure if this should be considered relevant. People can
always
    write broken clients which send back crap. Of course if it happens
    unintentionally because of (C1.) fewer IDs would help
    C3. Using origin-ID to detect MUC reflection doesn't always work
because MUCs
    may not reflect it.
    That's of course unfortunate but should IMHO considered an error
in the
    MUC implementation (probably a transport) and fixed there. I
understand
    that it might be difficult in some cases
    ( https://lab.louiz.org/louiz/biboumi/issues/3283 ) but as Daniel
    already pointed out yesterday it is much easier to fix a transport,
    since it knows which protocol it is talking, to instead of working
    around it at the end.
    In any case the current situation seems to be bad:
   
https://wiki.xmpp.org/web/XEP-Remarks/XEP-0045:_Multi-User_Chat#Matching_Your_Reflected_Message
    C4. Clients require a bounce of their messages to learn the
stanza-id which
    is used for MAM.
    Why do they need to know? Maybe they want to reference their own
message.
        Do they require this bounce anyway to make sure that their was
on rewriting?
    C5. Some MUCs rewrite the message-id
    Why is this allowed? It is even suggested here:
    https://xmpp.org/extensions/xep-0045.html#message
    C6. A global ID to reference messages might be nice.
    C7. When referencing a message for example by "liking" it a forgeable ID
    could get you to like things you didn't intend to like.
    This is a difficult problem because in many cases it requires
malicious
    clients and servers and those have a lot of power anyway.


D) Possible root cause:
    People do not trust the message IDs assigned by others and therefore
want to
    assign their own.


E) Suggested solutions, including partial solutions:
    E1. message-ID and origin-ID should always be the same, as proposed
by Georg
    in
https://mail.jabber.org/pipermail/standards/2017-September/033415.html
    Some concerns where voiced in that thread the only valid one is
that due
    to bad software we need to deal with the situation that they are
    different anyway.
    There was a privacy concern about the "by=" attribute but
origin-ID does
    not actually have that.
    According to Daniel and Georg things currently break down anyway
if this
    does not hold.
    E2. Make the ID verifiable: This is what I had in mind at the summit and
    after some discussion yesterday Jonas and Dave basically immediately
    came up with the same thing, so it might be reasonably
straightforward.
    Basically, the client calculates the ID based on some
information that
    it shares with the server like HASH(stream-id || sm-counter).
This would
    allow the server to verify that the client generated a proper
ID. Jonas
    suggested HMAC(key=stream-id, msg=sm-counter). If the message is
in a
    MUC, the MUC server can provide the user with some salt and then a
    HASH(message-counter || salt) could be used to ensure that
proper unique
    IDs are generated.
    This ID is based on there being a party which is in charge of
checking
    the IDs. If you connect to a malicious MUC with malicious
client