Dear Denis, Thanks for reviewing the UTF-8 proposal, and apologies for the late reply (I was out last week).
> On 30 Oct 2025, at 08:33, denis walker <[email protected]> wrote: > > Colleagues > > There is a reason why this took 11+ years. A lot of the work needed for the > technical change was considered and in some areas applied to the database > years ago. That is why we don't have an NWI on UTF8. What we could not agree > on was the policy aspect. Which attributes to allow UTF8 to be used with. > Whether we should just allow UTF8 or keep the attribute as Latin1/ASCII and > add an optional duplicate attribute in UTF8. This was brought up many times > and no one would commit to anything. Now we have an agreement on how to move > this forward. But now we don't agree that it should be a policy. > > In general where we apply rules or affect behaviour or mindset, we have done > it with a policy. If it is a straightforward technical tweak, we can do it > with an NWI. I hear what Angela says. These issues with UTF8 and contact > methods do not impact on, or require any changes to, existing policies. But > that statement does not rule out creating a new RIPE Database Policy to > define these rules and behaviours. We seem to be in a position now where any > change to the RIPE Database is considered to be a technical tweak in complete > isolation of anything else. So everything is an NWI. If that was true, why > are the "status:" attribute values defined in the Address Policy? Status is a > database thing. If you want to change it then it is just a technical tweak. > When we recently added 'ALLOCATED-ASSIGNED' why did we have to change policy? > It is just database semantics. Why do we have an Abuse-c Policy? Like > "status:", "abuse-c:" is just a database attribute. Either these are all > policy issues, or none of them are. Let's not pick and choose so you can rush > something through quickly. > > Similarly, if we are going to allow users to define their preferred method of > being contacted and maybe have a mandatory method like email, or suggest that > email is always offered, then this should also be defined in a RIPE Database > Policy. Again this is not a technical tweak. It is about rules and behaviour. > > All of these issues define how elements of the registry are managed and used. > Even if they require some technical tweak in order to implement the rules or > enforce some behaviour. > > Now let's look in a little more detail about what we are agreeing with UTF8. > As with all aspects of life in the 2020s, everyone is in a hurry to just 'get > things done'. Headlines and sound bites are what most people make decisions > on. Very few people have time for detail. That is always something for other > people to look at. But if you like the headlines and your heads start to nod, > decisions are made. Then detail becomes irrelevant...to you. We are making a > habit these days of looking at issues within small bubbles, in complete > isolation of the bigger picture. The consequences of your change can reach > far beyond your little bubble. > > With regard to "remarks:", this is, and always has been, defined as free > text. Absolutely anything can be included here. It has been an attribute in > the database since the beginning, about 36 years ago. For most of that time > it was never said this should not include any personal data. Some of these > may contain personal data in "remarks:" attributes. But data can be written > in UTF8 regardless of the data content. So I see no problem allowing UTF8 in > "remarks:". > > The "descr:" attribute is very different to "remarks:". > > In the Impact Analysis it was said: > Personal Data > Users must not add personal data in “remarks:” or “descr:” attributes, as > these attributes are not included in the daily limit accounting, are not > validated as they contain free text, and are not filtered by default. This is > already the case in the RIPE database and the introduction of UTF-8 encoding > does not change this. Personal data with UTF-8 encoding is out of scope. > > In the Operational update at RIPE-90 it was said: > Allow UTF-8 in “descr:” and “remarks:” Attributes > -Names and addresses NOT affected > > It is not correct to say it is already the case that the "descr:" attribute > must not include personal data. This is exactly what the "descr:" attribute > is. Again this attribute has existed since the beginning of the database. One > of the early definitions of it can be found in RIPE-050 RIPE Database > Template For Networks from 01 Apr 1992: See this presentation from RIPE 80 for an example why storing personal data in "descr:" is not a good idea (and the same applies for "remarks:"): https://ripe80.ripe.net/presentations/39-RIPE-Database-and-GDPR-final.pdf I agree that anything at all can be included in "descr:" or "remarks:", but personal data in either attribute will not be subject to the daily limit or filtered in query responses. > inetnum: > > descr: > Description of the network. > Give organisation and place. > Postal address is not needed, this can be found via the contacts. > You can't send postal mail to a bunch of routers and transceivers, > can you? The country is given in country:. > > Format: free text, one line per entry, multiple lines in sequence > Example: > descr: Network Bugs Feeding Facility > descr: Terabit Labs Inc. > descr: Northtown > Mandatory > > For the last 36 years, operators have been adding the End User's name and > location details into the "descr:" attributes. If you check the database for > INETNUM and INET6NUM objects created during October 2025, you will see that > most still include the name and location in these attributes. Every object > type except PERSON, ROLE, KEY-CERT and IRT includes the "descr:" attribute. > Across the database, in the applicable objects, there are in total: > > objects: 6637650 > descr: 7884770 > > So we have almost 8m "descr:" attributes largely containing name and location > details of, mostly, End Users operating public networks. Now the problem I > have with this discussion and the conclusions being drawn is the mixing of > UTF8 issues with those of personal data and privacy concerns. If we are > talking about allowing UTF8 then let's stick to that topic. Do not mix it > with privacy concerns. They are completely separate issues. You can apply > UTF8 to "descr:" regardless of the data content. This proposal is not intended to address issues regarding personal data. However the intended change may be used to add personal data where it's not protected (for example, non-latin names and addresses can be more accurately represented in UTF-8 rather than transliterated to latin-1). That is why the impact analysis states not to store personal data in those attributes. There are other more specific attributes better suited for personal data, and also subject to filtering and the daily query limit. I wanted to be clear that storing non-latin encoded personal data is out of scope, and needs to be considered in a separate proposal. > The current definition of "descr:" in the database documentation simply says > "A short description related to the object.". But for many years it was more > like RIPE-050 above. So resource holders were required to include the name > and location details of a network user. They are still doing that today. Some > of that is personal information. Probably no one has any idea how much of the > "descr:" data is personal information. If we start pushing new rules about > not including personal information in these attributes, resource holders may > stop putting this information in these attributes. That could be quite > damaging for some of the stakeholders of the RIPE Database. Now some may say > we should not include personal data in these attributes. But we do not have a > Business Requirements Document defining the business case for operating a > public registry in the 2020s. So it is impossible to say if any of the > exceptions in GDPR allow the registry to process this personal information. > So can I suggest that as this conversation continues, and in any conclusions > that are drawn, we focus on UTF8 and leave privacy for another thread. > > So in conclusion, I agree with allowing UTF8 in "remarks:" and "descr:" > attributes, regardless of the data content of those attributes, but I think > it should be defined in a RIPE Database Policy. Thanks for your feedback. Regards Ed Shryane RIPE NCC > cheers > denis > ----- To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/db-wg.ripe.net/ As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. More details at: https://www.ripe.net/membership/mail/mailman-3-migration/
