Re: [Standards] UPDATED: XEP-0292 (vCard4 Over XMPP)
On 6/24/11 10:02 AM, XMPP Extensions Editor wrote: > Version 0.5 of XEP-0292 (vCard4 Over XMPP) has been released. > > Abstract: This document specifies an XMPP extension for use of the > vCard4 XML format in XMPP systems, with the intent of obsoleting the > vcard-temp format. > > Changelog: Corrected XSLT script; provided detailed examples of the > vcard-temp and vCard4 XML formats. (psa) > > Diff: http://xmpp.org/extensions/diff/api/xep/0292/diff/0.4/vs/0.5 > > URL: http://xmpp.org/extensions/xep-0292.html Folks, I'm done with this one for a little while (too many other things to work on). Reviews would be welcome. As background, to make progress on this spec I compared the vCard "flavors" specified in these documents: vcard3 = http://tools.ietf.org/html/rfc2426 vcard3 XML = http://tools.ietf.org/id/draft-dawson-vcard-xml-dtd-03.txt vcard-temp = http://xmpp.org/extensions/xep-0054.html vcard4 = http://tools.ietf.org/html/draft-ietf-vcarddav-vcardrev-22 vcard4 XML = http://tools.ietf.org/html/draft-ietf-vcarddav-vcardxml-11 There are many differences and discrepancies between those flavors, complicating the task of migrating from vcard-temp to vcard4 XML. Although I think that I've defined things correctly now, there might be some more ugly beasts lurking about in this dark forest. :) Thanks! /psa
Re: [Standards] RTT: no negotiation of the feature
On Jun 24, 2011, at 8:53 AM, Mark Rejhon wrote: > On Fri, Jun 24, 2011 at 11:43 AM, Kurt Zeilenga > wrote: > > earlier? I think this will make most people happy, and will only add a > > few lines to the spec. See for example XEP-0085, section 4: > > http://xmpp.org/extensions/xep-0085.html > > > > Kurt et cetra, would this be satisfactory in the short term? > > Yes. > > Ok, it's not a painful change, and allows me to get the spec up sooner before > too many companies do damage with proprietary RTT. Kurt? > > > > It would at least mean XMPP RTT would now have a basic mechanism of > > discovering whether the other end supports RTT, and being able to restrain > > from sending RTT if the other end does not support RTT. This would not be > > the complete session negotiation algorithm, but would allay the cheif > > concern of Kurt. > > Correct, and it would allow for fall back to unextended XMPP if RTT was not > available end-to-end, which I would think quite important in emergency and > deaf communications. > > Yes, but RTT is backwards compatible, so both RTT and non-RTT conversations > look exactly the same to a client that do not support RTT. My point is that if one uses an extension without it first being successfully negotiated, one runs the risk of blocking which is far more disrupting approach than simple feature negotiation disruption. Where an extension causes harm (or is perceived to be harmful), one can expect service/network operators to take steps to prevent that harm (or perceived harm). Operators would much rather just prevent such extensions by disrupting the negotiation of the feature's use than taking more disruptive action, such as dropping the XMPP sessions of the clients which send RTT without first successfully completing negotiation of the feature. But if push comes to shove, the network need to generally well operate will likely trump the desire of a few users to use some extension deemed harmful to the network. -- Kurt
[Standards] UPDATED: XEP-0292 (vCard4 Over XMPP)
Version 0.5 of XEP-0292 (vCard4 Over XMPP) has been released. Abstract: This document specifies an XMPP extension for use of the vCard4 XML format in XMPP systems, with the intent of obsoleting the vcard-temp format. Changelog: Corrected XSLT script; provided detailed examples of the vcard-temp and vCard4 XML formats. (psa) Diff: http://xmpp.org/extensions/diff/api/xep/0292/diff/0.4/vs/0.5 URL: http://xmpp.org/extensions/xep-0292.html
Re: [Standards] RTT: no negotiation of the feature
On Fri, Jun 24, 2011 at 11:43 AM, Kurt Zeilenga wrote: > > > earlier? I think this will make most people happy, and will only add a > > few lines to the spec. See for example XEP-0085, section 4: > > http://xmpp.org/extensions/xep-0085.html > > > > Kurt et cetra, would this be satisfactory in the short term? > Yes. > Ok, it's not a painful change, and allows me to get the spec up sooner before too many companies do damage with proprietary RTT. Kurt? > It would at least mean XMPP RTT would now have a basic mechanism of > discovering whether the other end supports RTT, and being able to restrain > from sending RTT if the other end does not support RTT. This would not be > the complete session negotiation algorithm, but would allay the cheif > concern of Kurt. > > Correct, and it would allow for fall back to unextended XMPP if RTT was not > available end-to-end, which I would think quite important in emergency and > deaf communications. > Yes, but RTT is backwards compatible, so both RTT and non-RTT conversations look exactly the same to a client that do not support RTT. In fact, if one wanted, one can even have groupchat's with mixed RTT and non-RTT perfectly, even though I don't explicitly mention support for group chats because of the considerations I published at http://www.marky.com/realjabber/XMPP-RTT-Supplement_2011-06-17.pdf ... linked from http://www.realjabber.org Right now, RTT spec is defined for one-on-one conversations (even though the RTT spec can be used verbatim for groupchats). I mentioned group chats in the previous specification, but for simplicity I removed mention/support for group chat even though the RTT protocol continues to be compatible for use in groupchat. Mark Rejhon
Re: [Standards] RTT: no negotiation of the feature
On Jun 24, 2011, at 8:27 AM, Mark Rejhon wrote: > 2011/6/24 Remko Tronçon > Hi Mark, > > On Fri, Jun 24, 2011 at 5:15 PM, Mark Rejhon wrote: > > The earlier spec covered feature negotiation via XEP-0020. However, it was > > encouraged by a few people that spec simplification became more important, > > and to focus chiefly on the most basic, core interop issues, at least for > > the first published version of the specification. > > How about just using Disco/Caps discovery for now, as was suggested > earlier? I think this will make most people happy, and will only add a > few lines to the spec. See for example XEP-0085, section 4: > http://xmpp.org/extensions/xep-0085.html > > Kurt et cetra, would this be satisfactory in the short term? Yes. > It would at least mean XMPP RTT would now have a basic mechanism of > discovering whether the other end supports RTT, and being able to restrain > from sending RTT if the other end does not support RTT. This would not be the > complete session negotiation algorithm, but would allay the cheif concern of > Kurt. Correct, and it would allow for fall back to unextended XMPP if RTT was not available end-to-end, which I would think quite important in emergency and deaf communications. -- Kurt
Re: [Standards] RTT: no negotiation of the feature
2011/6/24 Remko Tronçon > Hi Mark, > > On Fri, Jun 24, 2011 at 5:15 PM, Mark Rejhon wrote: > > The earlier spec covered feature negotiation via XEP-0020. However, it > was > > encouraged by a few people that spec simplification became more > important, > > and to focus chiefly on the most basic, core interop issues, at least for > > the first published version of the specification. > > How about just using Disco/Caps discovery for now, as was suggested > earlier? I think this will make most people happy, and will only add a > few lines to the spec. See for example XEP-0085, section 4: > http://xmpp.org/extensions/xep-0085.html Kurt et cetra, would this be satisfactory in the short term? It would at least mean XMPP RTT would now have a basic mechanism of discovering whether the other end supports RTT, and being able to restrain from sending RTT if the other end does not support RTT. This would not be the complete session negotiation algorithm, but would allay the cheif concern of Kurt.
Re: [Standards] RTT: no negotiation of the feature
Hi Mark, On Fri, Jun 24, 2011 at 5:15 PM, Mark Rejhon wrote: > The earlier spec covered feature negotiation via XEP-0020. However, it was > encouraged by a few people that spec simplification became more important, > and to focus chiefly on the most basic, core interop issues, at least for > the first published version of the specification. How about just using Disco/Caps discovery for now, as was suggested earlier? I think this will make most people happy, and will only add a few lines to the spec. See for example XEP-0085, section 4: http://xmpp.org/extensions/xep-0085.html cheers, Remko
Re: [Standards] RTT: no negotiation of the feature
On Jun 24, 2011, at 8:00 AM, Mark Rejhon wrote: > On Fri, Jun 24, 2011 at 10:51 AM, Kurt Zeilenga > wrote: > I should note that we'll kill this one way or the other, even if there's no > negotiation. I just rather kill it by disrupting the negotiation. > > I just think it's really bad form to have non-negoiated extensions. > > It's not a 100% non-negotiated extension. It's used before negotiated. That's bad form. > Negotiation is simply optional, and not documented in the specification. > Accept is done by continuing RTT by replying to event='start' with an > event='start'. This means that you have to implement the extension to stop it, as opposed to simply not advertising (or disrupting the advertisement) of the extension for it not to be used. > Reject be done by rejecting an attempted event='start' with an event='stop' > from the other end. > > I also must point out that at least one 911 systems integrator is already > testing XMPP RTT, as a long-term replacement for deaf TDD/TTY, as a companion > to RFC4103 / T.140 which is also considered for this use too as well. Thus, > servers are encouraged to stick to server policy (i.e. bandwidth > rate-limiting algorithms) rather than blocking the extension. Rate-limits kick in too late, the damage would already have been done. We need to stop the originator from putting the traffic onto the network. -- Kurt
Re: [Standards] RTT: no negotiation of the feature
On Fri, Jun 24, 2011 at 10:58 AM, Kevin Smith wrote: > > Servers can inject event='stop' to achieve the same thing. Problem > solved? > > (We don't condone this. XMPP RTT may interoperate with 911 services, as a > > replacement for deaf TDD/TTY legal requirements.) > > In that case, surely negotiation is vital and urgent? > The earlier spec covered feature negotiation via XEP-0020. However, it was encouraged by a few people that spec simplification became more important, and to focus chiefly on the most basic, core interop issues, at least for the first published version of the specification. We can agree to accelerating some kind of a standards-compliant session negotiation quickly (i.e. less than a month from now). But more than one company already developed proprietary variations of real time text over XMPP (all of which were inferior to XMPP RTT), and I finally successfully convinced one of them to switch to my XMPP RTT standard. Note: If anybody needs to block XMPP RTT abuse, they should do it via bandwidth policy, not via extension-blocking policy, due to the possible use of XMPP RTT by deaf individuals (assistive act violations) and 911 (emergency accessibility). Also, XMPP RTT is also used on mobile phones. Carefully done, by a slow cellphone typist, XMPP RTT only uses 150 bytes a second (UTF-8 XML bytes, excluding TCP/IP overhead). Not a problem even for GPRS.
Re: [Standards] RTT: no negotiation of the feature
On Fri, Jun 24, 2011 at 10:51 AM, Kurt Zeilenga wrote: > I should note that we'll kill this one way or the other, even if there's no > negotiation. I just rather kill it by disrupting the negotiation. > > I just think it's really bad form to have non-negoiated extensions. > It's not a 100% non-negotiated extension. Negotiation is simply optional, and not documented in the specification. Accept is done by continuing RTT by replying to event='start' with an event='start'. Reject be done by rejecting an attempted event='start' with an event='stop' from the other end. I also must point out that at least one 911 systems integrator is already testing XMPP RTT, as a long-term replacement for deaf TDD/TTY, as a companion to RFC4103 / T.140 which is also considered for this use too as well. Thus, servers are encouraged to stick to server policy (i.e. bandwidth rate-limiting algorithms) rather than blocking the extension.
Re: [Standards] RTT: no negotiation of the feature
On Fri, Jun 24, 2011 at 3:55 PM, Mark Rejhon wrote: > On Fri, Jun 24, 2011 at 10:48 AM, Kurt Zeilenga > wrote: >> >> Certain operational networks we support cannot deal with the extra >> traffic. >> >> > That said, I agree -- it is going to be added to the spec in due time, >> > well before XMPP RTT clients become popular. >> >> I see reason why a basic yes/no negotiation of the extension cannot be >> added now, whether by iq or by caps or some other appropriate mechanism. If >> it needs to change later, fine. But please add something now. > > Servers can inject event='stop' to achieve the same thing. Problem solved? > (We don't condone this. XMPP RTT may interoperate with 911 services, as a > replacement for deaf TDD/TTY legal requirements.) In that case, surely negotiation is vital and urgent? /K
Re: [Standards] RTT: no negotiation of the feature
On Fri, Jun 24, 2011 at 10:48 AM, Kurt Zeilenga wrote: > Certain operational networks we support cannot deal with the extra traffic. > > > That said, I agree -- it is going to be added to the spec in due time, > well before XMPP RTT clients become popular. > > I see reason why a basic yes/no negotiation of the extension cannot be > added now, whether by iq or by caps or some other appropriate mechanism. If > it needs to change later, fine. But please add something now. > Servers can inject event='stop' to achieve the same thing. Problem solved? (We don't condone this. XMPP RTT may interoperate with 911 services, as a replacement for deaf TDD/TTY legal requirements.)
Re: [Standards] RTT: no negotiation of the feature
I should note that we'll kill this one way or the other, even if there's no negotiation. I just rather kill it by disrupting the negotiation. I just think it's really bad form to have non-negoiated extensions. -- Kurt On Jun 24, 2011, at 7:48 AM, Kurt Zeilenga wrote: > > On Jun 24, 2011, at 7:44 AM, Mark Rejhon wrote: > >> On Fri, Jun 24, 2011 at 10:26 AM, Kurt Zeilenga >> wrote: >> We need to keep experiments from harming our production networks. If this >> extensions gets used blindly, it might well trash some production network. >> >> I don't care how this feature is negotiated between the two entities >> intended to experiment with it, I only care that I have some ability to >> disrupt that negotiation so I can prevent this extensions use and hence >> protect my network from the real harm that would come by its use on my >> network. >> >> We should keep this in perspective: This is XMPP RTT, not in-band >> bytestreams (i.e. XEP-0096 file transfer). It is low bandwidth the vast >> majority of the time, and contributes no additional data when nobody is >> typing. > > Certain operational networks we support cannot deal with the extra traffic. > >> That said, I agree -- it is going to be added to the spec in due time, well >> before XMPP RTT clients become popular. > > I see reason why a basic yes/no negotiation of the extension cannot be added > now, whether by iq or by caps or some other appropriate mechanism. > > If it needs to change later, fine. But please add something now. > > -- Kurt >
Re: [Standards] RTT: no negotiation of the feature
On Jun 24, 2011, at 7:44 AM, Mark Rejhon wrote: > On Fri, Jun 24, 2011 at 10:26 AM, Kurt Zeilenga > wrote: > We need to keep experiments from harming our production networks. If this > extensions gets used blindly, it might well trash some production network. > > I don't care how this feature is negotiated between the two entities intended > to experiment with it, I only care that I have some ability to disrupt that > negotiation so I can prevent this extensions use and hence protect my network > from the real harm that would come by its use on my network. > > We should keep this in perspective: This is XMPP RTT, not in-band bytestreams > (i.e. XEP-0096 file transfer). It is low bandwidth the vast majority of the > time, and contributes no additional data when nobody is typing. Certain operational networks we support cannot deal with the extra traffic. > That said, I agree -- it is going to be added to the spec in due time, well > before XMPP RTT clients become popular. I see reason why a basic yes/no negotiation of the extension cannot be added now, whether by iq or by caps or some other appropriate mechanism. If it needs to change later, fine. But please add something now. -- Kurt
Re: [Standards] RTT: no negotiation of the feature
On Fri, Jun 24, 2011 at 10:26 AM, Kurt Zeilenga wrote: > We need to keep experiments from harming our production networks. If this > extensions gets used blindly, it might well trash some production network. > > I don't care how this feature is negotiated between the two entities > intended to experiment with it, I only care that I have some ability to > disrupt that negotiation so I can prevent this extensions use and hence > protect my network from the real harm that would come by its use on my > network. > We should keep this in perspective: This is XMPP RTT, not in-band bytestreams (i.e. XEP-0096 file transfer). It is low bandwidth the vast majority of the time, and contributes no additional data when nobody is typing. That said, I agree -- it is going to be added to the spec in due time, well before XMPP RTT clients become popular.
Re: [Standards] RTT, take 2
On Fri, Jun 24, 2011 at 4:08 AM, Dave Cridland wrote: > 1) Processing software may have decoded the UTF-8 into "something", making > it awkward to manage. > > 2) Referring to UTF-8 octets means we have silly states where we could edit > inside characters. It's even possible this may be used intentionally, in > some languages. > > So I'd say that we should refer to characters in a string, and deal with > Unicode code-points in the abstract. I'd expect that implementations would > convert this internally into whatever made sense for them. > That's what I did in the v0.0.2 of the specification already, but it makes it necessary to explain which string format, which made it necessary to say it is based on UTF16 strings. Unfortunately, the same string returns different Unicode encodings in the programming language's native Unicode storage format (not the wire transmission format before XML processing): UTF8: String.Length("Québec") == 7 UTF16: String.Length("Québec") == 6 Now, when we start using Chinese characters outside the BMP, we now also diverge between UTF16 and UCS4 for exactly the same chinese character: UTF16: String.Length("#") == 2 UCS4: String.Length("#") == 1 This plays unfortunate havoc with String.Insert's and String.Delete's. This hurts interoperability. Therefore, we have to go with a consistent method. Therefore, we had to go to "Unicode code points"
Re: [Standards] RTT, take 2
2011/6/24 Remko Tronçon > I'm wondering whether 'code points' are any better than UTF-8 based > positioning. Isn't it possible that a codepoint position also points > inside a character/glyph/...? That is correct, and yes it is intentionally possible, for reasons explained in my other reply to you today.
Re: [Standards] RTT: no negotiation of the feature
On Jun 24, 2011, at 7:10 AM, Mark Rejhon wrote: > Re: http://www.xmpp.org/extensions/inbox/realtimetext.html > > On Fri, Jun 24, 2011 at 9:44 AM, Kurt Zeilenga > wrote: > I am quite concerned that the current spec offers zero negotiation of the > extension before its use. > I urge the authors to add some negotiation, preferable before it's published > as XEP. > > I agree, we are going to be developing a session negotiation mechanism over > time: > However, it is not necessary for interoperability right now: > > There was a negotiation mechanism in the previous spec, but it was claimed to > be overly complicated. Due to section 4.3.1 (backwards compatible), it is not > necessary to even use 'start' or 'stop' since RTT clients can work without > 'start' and 'stop'. A sender can send RTT right away, and a recipient can > interpret RTT right away. > > Experimentation during the Experimental stage is needed to determine best > interoperability for the process of starting a real-time-text session We need to keep experiments from harming our production networks. If this extensions gets used blindly, it might well trash some production network. I don't care how this feature is negotiated between the two entities intended to experiment with it, I only care that I have some ability to disrupt that negotiation so I can prevent this extensions use and hence protect my network from the real harm that would come by its use on my network. -- Kurt > and signalling the remote end that a session has started (in the future, it > might be a process where one end starts a session, and the other end does an > Accept/Reject -- similiar to AOL AIM Real Time IM. Or it might be a > different preferred method of starting a RTT session). It is also a "out in > the field" user preference that might influence the preferred session > negotiation algorithm, and several companies (4) are already working on XMPP > RTT based on this standard. Due to section 4.3.1, failure of signalling is > not a catastrophe at this early experimental stage, RTT will simply be turned > off but the chat conversation will continue to function normally. > > I covered some of this discussion in the "Supplemental" document at > www.realjabber.org as a potential candidate mechanism to mimic the AIM > Real-Time IM capability. > > Mark Rejhon
Re: [Standards] RTT, take 2
Regarding: http://xmpp.org/extensions/inbox/realtimetext.html (Replies to Remko Tronçon, David Cridland) On Fri, Jun 24, 2011 at 9:04 AM, Florian Zeitz wrote: > On 24.06.2011 11:24, Remko Tronçon wrote: > > I'm wondering whether 'code points' are any better than UTF-8 based > > positioning. Isn't it possible that a codepoint position also points > > inside a character/glyph/...? Peter could probably shed some light on > > this. > > > FWIW, I think using codepoints solves somewhat different problem. > If we count codepoints we can delete "half a character", e.g. remove the > "combining cedilla" from ç, but if we count UTF-(8,16) based we can > delete "half a codepoint" rendering the result undecodeable which is far > worse. > Florian is correct -- this is one of the many reasons why we don't want to use "UTF-8 counting methodology" for indexes and lengths for XMPP RTT real-time editing (text inserts and deletes). Interoperability between slightly buggy clients in UTF-8 can be much worse. On Fri, Jun 24, 2011 at 5:38 AM, Dave Cridland wrote: > As in, adding a "C" character at the fifth code-point of "Tronçon" might > give you "TroncÇon", or "TronçCon", depending on whether "ç" is a > "c-with-cedilla" or a "c" followed by a "combining cedilla"? > > Yes, I'm quite sure that's possible. > Real-time editing worked fine in both cases, due to section 5.2.1 "Monitoring Message Edits". The pre-edit string is compared to the post-edit string, in order to determine what code points changed. Although I did not publish the algorithm, the algorithm to do so is actually simpler than most think -- 50 lines of code (l.340-390 of RealTimeText.cs of the RealJabber open source). By left/right scanning for unchanged characters (even if the length has changed), you find the changed section in the middle of the string and extract that out. It works even with pastes, auto-spellcheckers, auto-accenting, complex multi-keypress keyboard entry (multiple dead characters) because we aren't worried about the input method, but only worried about how the message changed. Which is why I added section 5.2.1 to Implementor Notes. "Monitoring Message Edits" which is recommended instead of monitoring individual keypresses. In fact you can use any operating systems' textbox and let the operating system worry about presentation, which is why we aren't worried about counting individual glyphs (besides, we have no control over counting glyphs with most GUI frameworks) > I don't have a solution, either, except to note that this applies to UTF-8 > octets etc as well, unless you normalize all strings first - but then it's > really not clear to me how to translate editing actions in a GUI into that > form. > The editing actions need to be executed before normalizing, because there is not a consistent standard of normalization between different platforms. This is an additional reason we don't count based on glyphs, too. One platform may display a glyph as 2 characters, and another platform as 1 character. The method we chose, solves that problem. Mark Rejhon
Re: [Standards] RTT: no negotiation of the feature
On Fri, Jun 24, 2011 at 3:10 PM, Mark Rejhon wrote: > Re: http://www.xmpp.org/extensions/inbox/realtimetext.html > On Fri, Jun 24, 2011 at 9:44 AM, Kurt Zeilenga > wrote: >> >> I am quite concerned that the current spec offers zero negotiation of the >> extension before its use. >> I urge the authors to add some negotiation, preferable before it's >> published as XEP. > > I agree, we are going to be developing a session negotiation mechanism over > time: I think a sensible baseline, as I noted in the other thread (to which I need to reply again) is using caps to signal support. This at least does away with the most terrible "Doesn't support it, but gets spammed anyway" case. /K
Re: [Standards] RTT: no negotiation of the feature
Re: http://www.xmpp.org/extensions/inbox/realtimetext.html On Fri, Jun 24, 2011 at 9:44 AM, Kurt Zeilenga wrote: > I am quite concerned that the current spec offers zero negotiation of the > extension before its use. > I urge the authors to add some negotiation, preferable before it's > published as XEP. > I agree, we are going to be developing a session negotiation mechanism over time: However, it is not necessary for interoperability right now: There was a negotiation mechanism in the previous spec, but it was claimed to be overly complicated. Due to section 4.3.1 (backwards compatible), it is not necessary to even use 'start' or 'stop' since RTT clients can work without 'start' and 'stop'. A sender can send RTT right away, and a recipient can interpret RTT right away. Experimentation during the Experimental stage is needed to determine best interoperability for the process of starting a real-time-text session and signalling the remote end that a session has started (in the future, it might be a process where one end starts a session, and the other end does an Accept/Reject -- similiar to AOL AIM Real Time IM. Or it might be a different preferred method of starting a RTT session). It is also a "out in the field" user preference that might influence the preferred session negotiation algorithm, and several companies (4) are already working on XMPP RTT based on this standard. Due to section 4.3.1, failure of signalling is not a catastrophe at this early experimental stage, RTT will simply be turned off but the chat conversation will continue to function normally. I covered some of this discussion in the "Supplemental" document at www.realjabber.org as a potential candidate mechanism to mimic the AIM Real-Time IM capability. Mark Rejhon
[Standards] RTT: no negotiation of the feature
I am quite concerned that the current spec offers zero negotiation of the extension before its use. I urge the authors to add some negotiation, preferable before it's published as XEP. -- Kurt
Re: [Standards] RTT, take 2
On Jun 24, 2011, at 6:04 AM, Florian Zeitz wrote: > On 24.06.2011 11:24, Remko Tronçon wrote: >>> So I'd say that we should refer to characters in a string, and deal with >>> Unicode code-points in the abstract. >> >> I'm wondering whether 'code points' are any better than UTF-8 based >> positioning. Isn't it possible that a codepoint position also points >> inside a character/glyph/...? Peter could probably shed some light on >> this. >> > FWIW, I think using codepoints solves somewhat different problem. > > If we count codepoints we can delete "half a character", e.g. remove the > "combining cedilla" from ç, but if we count UTF-(8,16) based we can > delete "half a codepoint" rendering the result undecodeable which is far > worse. The protocol ought to defined in wire terms… but state a few guidelines on handling of characters composed of multiple code points. For instance, if a character is sent as(Y being a combining character), I have little problem with being edited away so long as by itself is valid… or being replaced with (another combining character) without touching . It's my view that that the client needs to be aware enough of what's happening in the GUI and the wire to ensure both are sane. If you try to design this such that clients don't have to be aware of what really going on the wire or in the GUI, it will be quite fragile and prone to interoperability problems. -- Kurt
Re: [Standards] RTT, take 2
On 24.06.2011 11:24, Remko Tronçon wrote: >> So I'd say that we should refer to characters in a string, and deal with >> Unicode code-points in the abstract. > > I'm wondering whether 'code points' are any better than UTF-8 based > positioning. Isn't it possible that a codepoint position also points > inside a character/glyph/...? Peter could probably shed some light on > this. > FWIW, I think using codepoints solves somewhat different problem. If we count codepoints we can delete "half a character", e.g. remove the "combining cedilla" from ç, but if we count UTF-(8,16) based we can delete "half a codepoint" rendering the result undecodeable which is far worse.
Re: [Standards] RTT, take 2
I'm wondering whether 'code points' are any better than UTF-8 based > positioning. Isn't it possible that a codepoint position also points > inside a character/glyph/...? A codepoint is the fundamental thing defined by Unicode, but there is a related concept which could be called a character (or grapheme?), consisting of one or more codepoints (a codepoint representing a non-combining character, followed by zero or more codepoints representing combining characters). Yes, this why counting Unicode code points is the solution. But it needs to be done at a sufficiently low level, close to the transmission of messages. For e.g. erasure of one combined character consisting of two code points, the user interface action should at a low level result in erasure of two codepoints. That fact can be captured and sent in the RTT erasure element with an order to erase two code points. The receiving client has its received rtt messages as reference, and does the action in the received string, and then takes the result to presentation. Then the operation is independent of any local Unicode habits in the receiving environment. Two code points is still two code points at that level, and the operation can be done without ambiguities. Gunnar
Re: [Standards] RTT, take 2 -network load
Dave Cridland wrote 2011-06-24 11:01: I'd like to see, somewhere in this document, a discussion about network load, and a consideration that clients (and possibly servers) MAY, or possibly SHOULD, disable RTT if network conditions deteriorate. There is a very brief discussion along that line in section 4.4. The recommended transmission interval of one second and smoothing out of characters in presentation during that second is a first good guarantee against congestion. This is very different from the character-by-character transmission technologies that can cause very high network load by rapid typers. So, the load is a maximum of one message per second from each active typer. - About 5 kB/s. If character-by-character transmission was used, it could end up at 20 messages per second and about 100 kB/s. So it is a huge difference. How much mean load does a message-wise IM participant cause? Is it one message every 10 seconds and 1 kB/s ? Maybe a factor 5-10 less than the RTT user. It is a good habit to include a "Congestion considerations" section in this kind of specifications. So, let us aim at create one. I see the following parts can be created: -Server congestion considerations -Client load considerations -Multiparty load considerations What good bases for such discussions do you suggest to refer to? /Gunnar
Re: [Standards] RTT, take 2
On Fri, 24 Jun 2011 at 11:24:50 +0200, Remko Tronçon wrote: > > So I'd say that we should refer to characters in a string, and deal with > > Unicode code-points in the abstract. > > I'm wondering whether 'code points' are any better than UTF-8 based > positioning. Isn't it possible that a codepoint position also points > inside a character/glyph/...? A codepoint is the fundamental thing defined by Unicode, but there is a related concept which could be called a character (or grapheme?), consisting of one or more codepoints (a codepoint representing a non-combining character, followed by zero or more codepoints representing combining characters). (A glyph is something different, and as far as I can tell is only interesting if you make fonts or font-rendering algorithms.) In UTF-8 a codepoint is one or more bytes, in UTF-16 a codepoint is either one or two 16-bit words, and in UCS-4 a codepoint is one 32-bit word. Here are some codepoints: * U+0041 LATIN CAPITAL LETTER A * U+00C1 LATIN CAPITAL LETTER A WITH ACUTE * U+0301 COMBINING ACUTE ACCENT The grapheme Á could either be written as U+0041 U+0301 (decomposed form), or U+00C1 (composed form). Not all graphemes have a composed form. > For example, in Qt, this would most likely be > implemented using a QTextCursor ( > http://doc.trolltech.com/4.7/qtextcursor.html ). However, the text > talks about 'positioning at character X', and it doesn't seem to be > defined what this means. That might either be counting graphemes or codepoints, depending... S
Re: [Standards] RTT, take 2
On Fri Jun 24 10:24:50 2011, Remko Tronçon wrote: > So I'd say that we should refer to characters in a string, and deal with > Unicode code-points in the abstract. I'm wondering whether 'code points' are any better than UTF-8 based positioning. Isn't it possible that a codepoint position also points inside a character/glyph/...? Peter could probably shed some light on this. As in, adding a "C" character at the fifth code-point of "Tronçon" might give you "TroncÇon", or "TronçCon", depending on whether "ç" is a "c-with-cedilla" or a "c" followed by a "combining cedilla"? Yes, I'm quite sure that's possible. I don't have a solution, either, except to note that this applies to UTF-8 octets etc as well, unless you normalize all strings first - but then it's really not clear to me how to translate editing actions in a GUI into that form. Dave. -- Dave Cridland - mailto:d...@cridland.net - xmpp:d...@dave.cridland.net - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/ - http://dave.cridland.net/ Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
Re: [Standards] RTT, take 2
> So I'd say that we should refer to characters in a string, and deal with > Unicode code-points in the abstract. I'm wondering whether 'code points' are any better than UTF-8 based positioning. Isn't it possible that a codepoint position also points inside a character/glyph/...? Peter could probably shed some light on this. The major problem is that you want something that you can tell your GUI "remove N characters", but that such an operation is very toolkit-specific and not well specified, and that you don't have any control over this. For example, in Qt, this would most likely be implemented using a QTextCursor ( http://doc.trolltech.com/4.7/qtextcursor.html ). However, the text talks about 'positioning at character X', and it doesn't seem to be defined what this means. I think that deleting one 'character' using this API would potentially delete multiple unicode code points? (or maybe i don't know enough about unicode). But if my understanding is correct, then i'm not sure if such a positioning-based API would ever work in practice (for multiple implementations). cheers, Remko
Re: [Standards] RTT, take 2
On Wed Jun 22 16:52:53 2011, Kevin Smith wrote: I've performed a quick review of the new proposal. I have a handful of comments on the spec; I don't currently intend these to be blocking, for my part, when Council vote to Experimental. I consider this a vast improvement over the first proposed version of the document. Just to add... The nice trip down memory lane in Section 1 paints a rather rosy picture, I think. Since I was actually about, and using the net, in those days, I feel a flashback coming on. The biggest problem for a lot of these systems was the lag and network load they generated. This is evidenced in the way that Nagle's algorithm is the default in BSD derived socket stacks, for instance. Most of the talkers switched to using line buffering, Internet BBS developed clients (and/or CLients, depending on whether you were DOC or YAWC) which provided local editing facilities. C{lL}ient connections often got in ahead of the queue (remember queueing? No, of course not...) because of the vastly lower network load they generated, and people used them because of the vastly improved user experience of local echo - remote echo being not only more painful on its own, but in no small part due to the network load, latencies of 30 seconds or more were quite common. I'd like to see, somewhere in this document, a discussion about network load, and a consideration that clients (and possibly servers) MAY, or possibly SHOULD, disable RTT if network conditions deteriorate. Dave. -- Dave Cridland - mailto:d...@cridland.net - xmpp:d...@dave.cridland.net - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/ - http://dave.cridland.net/ Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
Re: [Standards] RTT, take 2
Remko Tronçon wrote: [ I don't like writing me-too e-mails, but you beat me by a minute to sending the exact same mail, so I'm doing it anyway ;-) ] So I'd say that we should refer to characters in a string, and deal with Unicode code-points in the abstract. I'd expect that implementations would convert this internally into whatever made sense for them. I think it would be the first protocol to depend on knowing how to count code points (I haven't needed it before), but I also think it's the only sensible thing to do, because you could end up with incorrect encodings using the protocol otherwise. Anyway, for applications that don't use Unicode libraries, rolling your own codepoint count isn't very hard, at least for utf-8. We just need a concise way to tell lengths and positions within the Unicode string. With Unicode, some characters can be composed of characters. Just the word "characters" has therefore the risk of being ambigous and need a clarification. RFC 5198 Network Unicode says: "Unicode identifies each character by an integer, called its "code point", in the range 0-0x10. These integers can be encoded into byte sequences for transmission in at least three standard and generally-recognized encoding forms, all of which are completely defined in The Unicode Standard and the documents cited below:" It is this "Unicode code point" that is meant in the length and position parameters in this specification, as any representation of the Unicode character number. With RFC 5198 using both the "character" and the "code point", and character being slightly ambigous, I suggest to use the term "Unicode code point". cheers, Gunnar
Re: [Standards] RTT, take 2
[ I don't like writing me-too e-mails, but you beat me by a minute to sending the exact same mail, so I'm doing it anyway ;-) ] > So I'd say that we should refer to characters in a string, and deal with > Unicode code-points in the abstract. I'd expect that implementations would > convert this internally into whatever made sense for them. I think it would be the first protocol to depend on knowing how to count code points (I haven't needed it before), but I also think it's the only sensible thing to do, because you could end up with incorrect encodings using the protocol otherwise. Anyway, for applications that don't use Unicode libraries, rolling your own codepoint count isn't very hard, at least for utf-8. cheers, Remko
Re: [Standards] RTT, take 2
On Fri Jun 24 02:54:12 2011, Peter Saint-Andre wrote: On 6/23/11 12:41 AM, Mark Rejhon wrote: > Opinion? On the wire is no such thing as a code point, there are only code points that are encoded using an encoding form like UTF-8 or UTF-16. For details, see: http://tools.ietf.org/html/draft-ietf-appsawg-rfc3536bis-02 Given that XMPP is pure UTF-8, I don't see a compelling reason to count UTF-16-encoded code points or UTF-32-encoded code points. I think UTF-16 and UTF-32 encodings would both be a bad idea; XMPP is purely UTF-8 as you say. However, I don't think that we should refer to UTF-8 octets either, here, for a number of reasons: 1) Processing software may have decoded the UTF-8 into "something", making it awkward to manage. 2) Referring to UTF-8 octets means we have silly states where we could edit inside characters. It's even possible this may be used intentionally, in some languages. So I'd say that we should refer to characters in a string, and deal with Unicode code-points in the abstract. I'd expect that implementations would convert this internally into whatever made sense for them. Dave. -- Dave Cridland - mailto:d...@cridland.net - xmpp:d...@dave.cridland.net - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/ - http://dave.cridland.net/ Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade