Re: [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes

Job Snijders via db-wg Fri, 24 Nov 2023 01:21:52 -0800

Dear Edward,

On Fri, Nov 24, 2023 at 10:03:15AM +0100, Edward Shryane via db-wg wrote:
> Currently the RIPE database only allows a subset of ASCII characters
> in the "org-name:", "person:" and "role:" attributes, for a few
> reasons including:
> 
> * These attributes are also a look-up key and the Whois protocol does
>   not allow specifying character sets in queries.
> * RPSL names are ASCII according to RFC2622
> * Using a normalised name makes the object easier to query
> * Reading a normalised name is easier to interpret
> 
> However there are some drawbacks to forcing names to only use a subset
> of ASCII characters:
> 
> * Organisations, roles and persons cannot use their actual name if it
>   includes characters outside this subset.
> * Normalisation is not standard, but is an interpretation done by each
>   maintainer, e.g. characters could be excluded or converted in
>   different ways.


The above two points are key in making the RIPE database useful and
accessible to everyone, I too would love to see those points addressed.

> Since we support the Latin-1 character set in the RIPE database, I
> propose we also allow non-ASCII Latin-1 characters in these
> attributes.
> 
> Querying for a name can be done either using the latin-1 characters
> (proposed) or a normalised, ASCII representation (currently). The
> normalised version will be generated by Whois and stored in a database
> index for querying. The primary key will also be generated from the
> normalised version.
> 
> Please let me know your feedback.

Wouldn't it be an opportune time to support UTF-8 instead of LATIN-1?
As I understand it, through the use of UTF-8 more languages could be
supported. UTF-8 seems to be the preferred character encoding in any new
IETF work (for good reason).

Have the effects of LATIN-1 on downstream applications such as NRTM v3
and NRTM v4 been considered?

You indicate that LATIN-1 already is supported in the RIPE database, so
I imagine you and the team already deliberated on the pro's and con's of
UTF-8 vs LATIN-1; and as such concluded with this particular
recommendation. I just wanted to make sure to raise these questions. :-)

Some interesting reading material on UTF-8 https://utf8everywhere.org/

Kind regards,

Job

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/db-wg

Re: [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes

Reply via email to