Sean,

As a clarification.  The UNI specs does list 2 on-disk formats.
This was done so tools could support both in the transition
from UTF-16LE with BOM to UTF-8 without BOM.

The strong recommendation is for all EDK II open source packages to
use UTF-8 without a BOM.  Since platform packages not maintained
in EDK II could be pulling forward UNI files in UTF-16LE, we
have not changed the UNI spec or tools to consider UTF-16LE
as unsupported.

Doing patch email reviews of UNI files in UTF-16LE is a challenge
so requiring UTF-8 without a BOM make this much easier.

The EDK II open source package conversion to UTF-8 without a BO
was performed in late 2015.  Here is one example:

https://github.com/tianocore/edk2/commit/3f5287971ffdb5c42e3325a3a94c101f08d3a02a#diff-14d2171dacfcac1fd2e1b1f7b885e530

A helper python script was added to help perform these conversions:

https://github.com/tianocore/edk2/blob/master/BaseTools/Scripts/ConvertUni.py

At some point, it may make sense to *require* UTF-8 without a 
BOM for all UNI files and all tools and for tools to reject
UNI files that are not in UTF-8 without a BOM format.

Mike

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-
> boun...@lists.01.org] On Behalf Of Sean Brogan via
> edk2-devel
> Sent: Wednesday, November 7, 2018 11:11 PM
> To: Gao, Liming <liming....@intel.com>
> Cc: edk2-devel@lists.01.org
> Subject: Re: [edk2] Edk2 uni file encoding
> 
> Liming,
> That was exactly what I was looking for.
> 
> Thanks
> Sean
> 
> 
> 
> 
> -----Original Message-----
> From: Gao, Liming <liming....@intel.com>
> Sent: Wednesday, November 7, 2018 10:01 PM
> To: Sean Brogan <sean.bro...@microsoft.com>
> Cc: edk2-devel@lists.01.org
> Subject: RE: Edk2 uni file encoding
> 
> Sean:
>   EDKII UNI spec
> (https://na01.safelinks.protection.outlook.com/?url=htt
> ps%3A%2F%2Fgithub.com%2Ftianocore%2Ftianocore.github.io
> %2Fwiki%2FEDK-II-
> Specifications&amp;data=02%7C01%7Csean.brogan%40microso
> ft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f988bf86f
> 141af91ab2d7cd011db47%7C1%7C0%7C636772536983024335&amp;
> sdata=veov60rbEtr3ub7RcreuFuqJvc4%2BdtAowph7kBGXW54%3D&
> amp;reserved=0) Chapter 2 defines UNI file format.
> EdkCompatibilityPkg is obsolete. BZ
> https://na01.safelinks.protection.outlook.com/?url=http
> s%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D
> 1103&amp;data=02%7C01%7Csean.brogan%40microsoft.com%7C5
> ffeb105737e4c00150208d6453fa46a%7C72f988bf86f141af91ab2
> d7cd011db47%7C1%7C0%7C636772536983024335&amp;sdata=LOLe
> zJzuK9kwu8QK78UM5nnCD%2FZEY5fxr1VQzk8sqY8%3D&amp;reserv
> ed=0 is submitted to delete EdkCompatibilityPkg from
> edk2/master. We will work on it.
> 
> EDK II Unicode files are used for mapping token names
> to localized strings that are identified by an RFC4646
> language code. The format for storing EDK II Unicode
> files on disk is UTF-8 (without a BOM character) or
> UTF-16LE (with a BOM character). The character content
> must be UCS-2.
> 
> Thanks
> Liming
> >-----Original Message-----
> >From: edk2-devel [mailto:edk2-devel-
> boun...@lists.01.org] On Behalf Of
> >Sean Brogan via edk2-devel
> >Sent: Thursday, November 08, 2018 7:00 AM
> >To: edk2-devel@lists.01.org
> >Subject: [edk2] Edk2 uni file encoding
> >
> >Is there a definitive answer for the file encoding for
> all UNI files in edk2?
> >If not I would like to propose one.  Incorrect
> encoding causes tool
> >issues and is something we can easily check for and
> fix.
> >
> >Proposal: All UNI files in edk2 should be
> >
> >
> >  1.  UTF-8
> >Or
> >
> >  1.  Use a BOM and be UTF-16
> >
> >https://na01.safelinks.protection.outlook.com/?url=htt
> ps%3A%2F%2Fen.wik
> >ipedia.org%2Fwiki%2FByte_order_mark&amp;data=02%7C01%7
> Csean.brogan%40mi
> >crosoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f98
> 8bf86f141af91ab2d
> >7cd011db47%7C1%7C0%7C636772536983024335&amp;sdata=1IET
> 4LN5l9FfMscffzgk0
> >t7IqYGyYNU9IrZafvi9osU%3D&amp;reserved=0
> >
> >Results from searching edk2:
> >1 - UTF-16 LE BOM file:
> >EdkCompatibilityPkg\Compatibility\FrameworkHiiOnUefiHi
> iThunk\Strings.un
> >i
> >919 - Without BOM and decoded as UTF-8
> >
> >Thoughts?
> >
> >Future question:  Can we make rule for all other
> standard file types
> >(c, h, dec, dsc, fdf, inf,)?
> >
> >Thanks
> >Sean
> >
> >
> >
> >_______________________________________________
> >edk2-devel mailing list
> >edk2-devel@lists.01.org
> >https://na01.safelinks.protection.outlook.com/?url=htt
> ps%3A%2F%2Flists.
> >01.org%2Fmailman%2Flistinfo%2Fedk2-
> devel&amp;data=02%7C01%7Csean.brogan
> >%40microsoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C
> 72f988bf86f141af9
> >1ab2d7cd011db47%7C1%7C0%7C636772536983024335&amp;sdata
> =HhfPaCyS0sKHu1fF
> >Gkfh%2FQ4pm34X68YKiaM6IN7%2Fzj0%3D&amp;reserved=0
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to