On 12/23/11 6:31 PM, "ext joao.abeca...@nokia.com" <joao.abeca...@nokia.com> wrote:
>[ Re-trying after the previous massive quoting and line-wrap fail :-/ ] > >Denis Dzyubenko wrote: >> 2011/12/9 João Abecasis <joao.abeca...@nokia.com>: >> >> inline QUuid QUuid::createFromName(const QUuid &ns, const >> >> QString &name) >> >> { >> >> return createFromName(ns, name.toUtf8()); >> >> } >> > >> > would only be updated to call the right implementations, as >> > appropriate. >> >> I like the current status of the patch very much. >> >> However I have one question - where utf8 comes from? Shouldn't it be >> defined by rfc, and if not imo we shouldn't arbitrary choose >> encodings, and maybe leave the default one in - which is utf-16 for >> QString > >This is my reasoning: > >1) As you mention the RFC doesn't specify encodings. In fact, it says >the owner of a namespace is free to decide how it should be used. For >this reason it's important that we support QByteArray as the canonical >form and let users make conscious decisions. > >2) In Qt, strings of text are represented as QString so it would be nice >to support QString-based names. This is the reason for adding those >overloads as convenience API, but doesn't tell us how QString-based >names should be translated to "a canonical sequence of octets" (quoting >the standard). > >3) The point of name-based UUIDs is that you can regenerate the UUIDs >knowing only the namespace UUID and a particular name. If you use the >QByteArray version, it's up to you to ensure this. When using the QString >version Qt needs to ensure it for you. > >This excludes locale- and system-dependent conversions, like >toLocal8Bit(), it also excludes straightforward utf16() as it is >dependent on endianness, and thus platform. > >4) UTF-8 is a good candidate because it is one possible "canonical >sequence of octets". But it's mostly that, a good candidate. > >So, there isn't a reason why it *has* to be utf-8, but I haven't seen >better alternatives. Other alternatives are toAscii or toLatin1, but >they're lossy encodings. Network-byte order UTF-16?... > >Anyway, one use case mentioned in the standard makes this convenience >approach very nice: > > QUrl url; > > // ... > > // NameSpace_DNS from RFC4122 > // {6ba7b810-9dad-11d1-80b4-00c04fd430c8} > QUuid nsDns(0x6ba7b810, 0x9dad, 0x11d1, 0x80, 0xb4, > 0x00, 0xc0, 0x4f, 0xd4, 0x30, 0xc8); > > QUuid uuidForUrl = QUuid::createFromName(nsDns, url.toString()); > >With the added benefit that in that use case it interoperates with >Python. > >("And what does python do?", you ask. Well, it avoids the decision >altogether and bails out on unicode strings. It only accepts a >byte-strings: > > $ python > Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import uuid > >>> uuid.NAMESPACE_DNS > UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8') > >>> uuid.uuid3(uuid.NAMESPACE_DNS, "www.widgets.com") > UUID('3d813cbb-47fb-32ba-91df-831e1593ac29') > >>> uuid.uuid3(uuid.NAMESPACE_DNS, u"www.widgets.com") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File >"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/uu >id.py", > line 512, in uuid3 > hash = md5(namespace.bytes + name).digest() > UnicodeDecodeError: 'ascii' codec can't decode byte 0xa7 in position > 1: ordinal not in range(128) > >) > >What do others think? I can see only two options that make sense. Either accept only ascii (ie. code points smaller 0x80), or use utf-8. The first option is a subset of the second one. Cheers, Lars _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development