Oh, actually I have.

I even have a case that does not work with mcs but works with csc -
i.e. the case that csc detects utf-8 regardless of BOM.


I forgot one thing - with regard to that remaining problem, we need
to fix WinForms build (because KeyboardLayout.cs seems to have
raw non-ASCII character:

syntax error, got token `IDENTIFIER'
System.Windows.Forms\KeyboardLayouts.cs(93,51): error CS1526: A new expression requires () or [] after type
System.Windows.Forms\KeyboardLayouts.cs(97,62): error CS8025: Parsing error
Compilation failed: 2 error(s), 0 warnings

They should be replaced by \uXXXX but I have no idea what those
characters actually are :|

Atsushi Eno


Marek Safar wrote:
Hello Eno,

Could you write some tests to cover this functionality. I mean e.g. simple test file with UTF header.

Thanks,
Marek

Hi again,

Agreed. In fact, I was also fixing bug #75065, maybe duplicate.
I have a fix for UTF8Encoding, but it uncovered another mcs bug
which does not handle files with BOM with specific encoding.
To summarize the situation:

    - Currently driver.cs does not process source files with
      default encoding.
    - UTF8Encoding.cs does not handle U+FEFF correctly.
    - When we fix UTF8Encoding.cs to handle U+FEFF, it starts
      to reject some source files which has BOM.
      (CS8025:Parsing error)
    - Even if we fix driver.cs to let StreamReader consider BOM
      (currently we disable it), there are still some files
      borking.

Am digging into this bug in depth. Hopefully I'll post a set of
fixes later.


... and now I finished the fixes as was done in the attached patch:

    - driver.cs :
      a) uses Encoding.Default for the default input.
      b) Always use true for detecting BOM at any time.
    - support.cs : Handle preamble_size precisely.
    - UTF8Encoding.cs : it should not skip U+FEFF. This fixes
      bug #73086 and #75065.

They should be applied at a time, except for a).

Atsushi Eno
public class 쯠쯡쯢
{
        public string 颀顰飳;

        public static void Main ()
        {
        }
}

public class 쯠쯡쯢
{
        static string 颀顰飳 = "頃頇";
        public static void Main ()
        {
                foreach (char c in 颀顰飳)
                        System.Console.WriteLine ("{0:X04}", (int) c);
        }
}

Index: Makefile
===================================================================
--- Makefile    (revision 48630)
+++ Makefile    (working copy)
@@ -2,7 +2,7 @@
 include ../../build/rules.make
 
 LIBRARY = System.Windows.Forms.dll
-LIB_MCS_FLAGS = /unsafe \
+LIB_MCS_FLAGS = /unsafe /codepage:65001 \
        /r:$(corlib) /r:System.dll /r:System.Xml.dll \
        /r:System.Drawing.dll /r:Accessibility.dll \
        /r:System.Data.dll /r:Mono.Posix.dll \
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Reply via email to