Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
Hi, Kornél Pál wrote: For new byte[1]: The following code for example (and other methods with empy output buffer) has to throw ArgumentException (not IndexOutOfRangeException that bytes[0] throws): Encoding e = Encoding.GetEncoding(1252); e.GetBytes(new char[] {'a'}, 0, 1, new byte[] {}, 0); Does this really differentiate results? I got ArgumentException from (unpatched) svn. (BTW that's why we need NUnit tests as well.) Atsushi Eno ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
Hello, This is the revised version of my original patch. Please approve this one and the other things can be modified (if needed) later. Comments inline (btw as it is often asked, text/plain is better than application/octet-stream for patch attachments for review). @@ -146,8 +146,7 @@ return GetBytesInternal (charPtr + charIndex, charCount, bytePtr + byteIndex, byteCount); } -#if !NET_2_0 - public unsafe override byte [] GetBytes (String s) + public override byte [] GetBytes (String s) { if (s == null) throw new ArgumentNullException (s); @@ -155,14 +154,10 @@ int byteCount = GetByteCount (s); byte [] bytes = new byte [byteCount]; - if (byteCount != 0) - fixed (char* charPtr = s) - fixed (byte* bytePtr = bytes) - GetBytesInternal (charPtr, s.Length, bytePtr, byteCount); + GetBytes (s, 0, s.Length, bytes, 0); return bytes; } -#endif public unsafe override int GetBytes (String s, int charIndex, int charCount, byte [] bytes, int byteIndex) Why do you *dare* make API mismatch here? Don't remove #if NET_2_0. It does not improve anything. @@ -300,6 +299,30 @@ } #endif + // Decode a buffer of bytes into a string. + public unsafe override String GetString (byte [] bytes, int index, int count) + { + if (bytes == null) + throw new ArgumentNullException (bytes); + if (index 0 || index bytes.Length) + throw new ArgumentOutOfRangeException (index, _(ArgRange_Array)); + if (count 0 || count (bytes.Length - index)) + throw new ArgumentOutOfRangeException (count, _(ArgRange_Array)); + + if (count == 0) + return string.Empty; + + // GetCharCountInternal + int charCount = count / 2; + string s = string.InternalAllocateStr (charCount); + + fixed (byte* bytePtr = bytes) + fixed (char* charPtr = s) + GetCharsInternal (bytePtr + index, count, charPtr, charCount); + + return s; + } + private unsafe int GetCharsInternal (byte* bytes, int byteCount, char* chars, int charCount) { So, this is the part that you said it is (will be) overriden only in Mono (for 1.1), right? I think this 1/2 memory consumption is awesome :) Other than the first point it looks good. Thanks Kornél. Atsushi Eno ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
Hi, Numbers are things that can convince me.:) Now I have three questions: - Are there parts of the patch that are OK to commit? - Do we care about class signature (what methods are overriden)? - Do we care about the implementation of virtual methods (what methods do they call)? I can follow any guidelines - altough I don't belive in performance above everything else - but I would like to know them otherwise I cannot follow them. Kornél - Original Message - From: Atsushi Eno [EMAIL PROTECTED] To: Kornél Pál [EMAIL PROTECTED] Cc: mono-devel-list@lists.ximian.com Sent: Tuesday, April 11, 2006 6:56 PM Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers Hi, I'm not interested in how your patch accomplishes MS.NET compatibility. My question is simple: is your patch *good* for Mono? using System; using System.Diagnostics; using System.IO; using System.Text; public class Test { public static void Main (string [] args) { int loop = args.Length 1 ? int.Parse (args [1]) : 100; string s = File.OpenText (args [0]).ReadToEnd (); Encoding e = Encoding.Unicode; Stopwatch sw = Stopwatch.StartNew (); for (int i = 0; i loop; i++) e.GetBytes (s); sw.Stop (); Console.WriteLine (sw.ElapsedMilliseconds); } } Before your patch: mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml 5038 After the patch: $ rundev2 mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml 10175 Atsushi Eno Kornél Pál wrote: Hi, I had some time and looked at all the encoding classes in I18N and in System.Text. byte* and char* is only used in UnicodeEncoding and GetByteCount and GetBytes in I18N. This means that having the #if NET_2_0 codes that you don't want to remove will cause performance loss on profile 2.0 in System.Text while will not improve performance in profile 1.1 as no such optimization is done. The solution is to use arrays in Encoding that improves simple, old fashioned encoding classes but override these methods to use pointers in classes that implement their core functionality using unsafe code. Encodings in System.Text (except UnicodeEncoding) use arrays and I think custom encodings created by users are array based as well so it results in better performance if we use arrays in Encoding. If custom encodings are using unsafe code they will have to override other methods because of MS.NET anyway to get the performance improvement. By overriding GetByteCount (string) and GetBytes (string) in MonoEncoding performance improvement on unsafe code will be preserved in addition it will be available in all profiles. MonoEncoding was already good so I just added these two methods and added the following code to GetBytes methods: int byteCount = bytes.Length - byteIndex; if (bytes.Length == 0) bytes = new byte [1]; Some check is required because bytes[0] will fail for zero-size arrays. bytes.Length == byteIndex could avoid this (but was present in only one of the methods) but this would prevent ArgumentException being thrown for too small output buffers. Creating a small array is little overhead and an exception will probably be thrown because charCount is non-zero. Attached an improved patch. Please review the patch. Kornél ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
Hi, I've done some tests: New 1.1: UnicodeEncoding: 6750 ASCIIEncoding: 18609 UTF8Encoding: 9922 CP932: 14641 New 2.0: UnicodeEncoding: 13594 ASCIIEncoding: 19562 UTF8Encoding: 16625 CP932: 38906 Old 1.1: UnicodeEncoding: 6906 ASCIIEncoding: 18859 UTF8Encoding: 10062 CP932: 21719 Old 2.0: UnicodeEncoding: 6750 ASCIIEncoding: 7297 UTF8Encoding: 16719 CP932: 45469 I have the following conclusion: UnicodeEncoding in 2.0 is slower because GetBytes(string) is not overridden. But performance is improved in 1.1 because the overridden implementation optimized for UnicodeEncoding. In ASCIIEncoding you can see the drawback of doing optimizations in Encoding class because the current code is only faster on 2.0. Using the new code 1.1 didn't change because not using unsafe code. There is no change in UTF8Encoding (or little but improvement is minimal). CP932 is faster because optimization is done in MonoEncoding. As a conclusion I think that Encoding should be MS.NET compatible because it's more likely to be used by users. And no improvement can be done in profile 1.1 because there are no unsafe methods so there is no use to sacrifice compatibility for performance. I think that the best solution for encoding optimization is to use a single unsafe implementation (for each funtionality; GetBytes, GetChars, GetByteCount, GetCharCount) and other methods (string, char[], byte[]) are calling this single implementation. This makes the code more maintainable as well. This is what I've done in UnicodeEncoding. And I think the point where we shouldn't care about MS.NET compatibility are the derived public encoding classes; we should override as much methods as we need even if they aren't overridden in MS.NET. (For private encoding classes layout compatibility is not requirement.) For example if I remove !NET_2_0 and NET_2_0 from GetBytes(string) and GetString(byte[], int, int) in UnicodeEncoding significant performance improvement can be achieved in all profiles. Is this deal acceptable? If you have any objections please let me know. Kornél - Original Message - From: Kornél Pál [EMAIL PROTECTED] To: Atsushi Eno [EMAIL PROTECTED] Cc: mono-devel-list@lists.ximian.com Sent: Wednesday, April 12, 2006 12:35 AM Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers Hi, Numbers are things that can convince me.:) Now I have three questions: - Are there parts of the patch that are OK to commit? - Do we care about class signature (what methods are overriden)? - Do we care about the implementation of virtual methods (what methods do they call)? I can follow any guidelines - altough I don't belive in performance above everything else - but I would like to know them otherwise I cannot follow them. Kornél - Original Message - From: Atsushi Eno [EMAIL PROTECTED] To: Kornél Pál [EMAIL PROTECTED] Cc: mono-devel-list@lists.ximian.com Sent: Tuesday, April 11, 2006 6:56 PM Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers Hi, I'm not interested in how your patch accomplishes MS.NET compatibility. My question is simple: is your patch *good* for Mono? using System; using System.Diagnostics; using System.IO; using System.Text; public class Test { public static void Main (string [] args) { int loop = args.Length 1 ? int.Parse (args [1]) : 100; string s = File.OpenText (args [0]).ReadToEnd (); Encoding e = Encoding.Unicode; Stopwatch sw = Stopwatch.StartNew (); for (int i = 0; i loop; i++) e.GetBytes (s); sw.Stop (); Console.WriteLine (sw.ElapsedMilliseconds); } } Before your patch: mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml 5038 After the patch: $ rundev2 mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml 10175 Atsushi Eno Kornél Pál wrote: Hi, I had some time and looked at all the encoding classes in I18N and in System.Text. byte* and char* is only used in UnicodeEncoding and GetByteCount and GetBytes in I18N. This means that having the #if NET_2_0 codes that you don't want to remove will cause performance loss on profile 2.0 in System.Text while will not improve performance in profile 1.1 as no such optimization is done. The solution is to use arrays in Encoding that improves simple, old fashioned encoding classes but override these methods to use pointers in classes that implement their core functionality using unsafe code. Encodings in System.Text (except UnicodeEncoding) use arrays and I think custom encodings created by users are array based as well so it results in better performance if we use arrays in Encoding. If custom encodings are using unsafe code they will have to override other methods because of MS.NET anyway to get
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
Just a small doubt: how could you run your test that uses Stopwatch under 1.x profile? Atsushi Eno Kornél Pál wrote: Hi, I've done some tests: New 1.1: UnicodeEncoding: 6750 ASCIIEncoding: 18609 UTF8Encoding: 9922 CP932: 14641 New 2.0: UnicodeEncoding: 13594 ASCIIEncoding: 19562 UTF8Encoding: 16625 CP932: 38906 Old 1.1: UnicodeEncoding: 6906 ASCIIEncoding: 18859 UTF8Encoding: 10062 CP932: 21719 Old 2.0: UnicodeEncoding: 6750 ASCIIEncoding: 7297 UTF8Encoding: 16719 CP932: 45469 using System; using System.Diagnostics; using System.IO; using System.Text; namespace Test { public class Test { public static int loop; public static string s; public static void Main(string[] args) { loop = args.Length 1 ? int.Parse(args[1]) : 100; s = File.OpenText(args[0]).ReadToEnd(); Do(Encoding.Unicode); Do(Encoding.ASCII); Do(Encoding.UTF8); Do(Encoding.GetEncoding(932)); } public static void Do(Encoding e) { Stopwatch sw = Stopwatch.StartNew(); for (int i = 0; i loop; i++) e.GetBytes(s); sw.Stop(); Console.WriteLine(e.GetType().Name + : + sw.ElapsedMilliseconds.ToString()); } } } ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
I included Stopwatch.cs in the exe. (Removed MonoTODO and compiled.) And used the same executable on 2.0 as well with a config file. (This was a lazy solution but is fine.:) Kornél - Original Message - From: Atsushi Eno [EMAIL PROTECTED] To: Kornél Pál [EMAIL PROTECTED] Cc: mono-devel-list@lists.ximian.com Sent: Wednesday, April 12, 2006 1:13 PM Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers Just a small doubt: how could you run your test that uses Stopwatch under 1.x profile? Atsushi Eno Kornél Pál wrote: Hi, I've done some tests: New 1.1: UnicodeEncoding: 6750 ASCIIEncoding: 18609 UTF8Encoding: 9922 CP932: 14641 New 2.0: UnicodeEncoding: 13594 ASCIIEncoding: 19562 UTF8Encoding: 16625 CP932: 38906 Old 1.1: UnicodeEncoding: 6906 ASCIIEncoding: 18859 UTF8Encoding: 10062 CP932: 21719 Old 2.0: UnicodeEncoding: 6750 ASCIIEncoding: 7297 UTF8Encoding: 16719 CP932: 45469 using System; using System.Diagnostics; using System.IO; using System.Text; namespace Test { public class Test { public static int loop; public static string s; public static void Main(string[] args) { loop = args.Length 1 ? int.Parse(args[1]) : 100; s = File.OpenText(args[0]).ReadToEnd(); Do(Encoding.Unicode); Do(Encoding.ASCII); Do(Encoding.UTF8); Do(Encoding.GetEncoding(932)); } public static void Do(Encoding e) { Stopwatch sw = Stopwatch.StartNew(); for (int i = 0; i loop; i++) e.GetBytes(s); sw.Stop(); Console.WriteLine(e.GetType().Name + : + sw.ElapsedMilliseconds.ToString()); } } } ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
Hi, Here's the real answer. I might not be fully understanding you, but if you are saying that your current patch as is should be applied, then it's no-go (due to the big difference in ASCII and Unicode as you showed us). Note that I'm not saying that performance is always higher-rated matter than compatibility (I'm actually rather anti-pro-optimization dude than others). If there is a way to achieves this compatibility and does not harm performance, it'd be awesome. I'm just not for *extermism*. The reason you were once convinced was not because the evidences are numbers but because the differences are significant. (Hey, there is no doubt that I love your detailed analysis BTW :-) I agree with you on that we had better feel free to override virtual stuff that does not result in MissingMethodException (but it might be only myself). For individual changes other than that performance loss, there are certain goodness in your patches. But for some I'm not convinced (such as giving new byte [1]) because you really don't provide evident NUnit tests. If you don't write any, I will create ones for some changes that I am convinced. But as I've written in the first reply, the difference is so minor that it is low priority for me. BTW thanks for the decent tester code. It conceived me that there are still some optimizible things. Atsushi Eno Kornél Pál wrote: Hi, I've done some tests: New 1.1: UnicodeEncoding: 6750 ASCIIEncoding: 18609 UTF8Encoding: 9922 CP932: 14641 New 2.0: UnicodeEncoding: 13594 ASCIIEncoding: 19562 UTF8Encoding: 16625 CP932: 38906 Old 1.1: UnicodeEncoding: 6906 ASCIIEncoding: 18859 UTF8Encoding: 10062 CP932: 21719 Old 2.0: UnicodeEncoding: 6750 ASCIIEncoding: 7297 UTF8Encoding: 16719 CP932: 45469 I have the following conclusion: UnicodeEncoding in 2.0 is slower because GetBytes(string) is not overridden. But performance is improved in 1.1 because the overridden implementation optimized for UnicodeEncoding. In ASCIIEncoding you can see the drawback of doing optimizations in Encoding class because the current code is only faster on 2.0. Using the new code 1.1 didn't change because not using unsafe code. There is no change in UTF8Encoding (or little but improvement is minimal). CP932 is faster because optimization is done in MonoEncoding. As a conclusion I think that Encoding should be MS.NET compatible because it's more likely to be used by users. And no improvement can be done in profile 1.1 because there are no unsafe methods so there is no use to sacrifice compatibility for performance. I think that the best solution for encoding optimization is to use a single unsafe implementation (for each funtionality; GetBytes, GetChars, GetByteCount, GetCharCount) and other methods (string, char[], byte[]) are calling this single implementation. This makes the code more maintainable as well. This is what I've done in UnicodeEncoding. And I think the point where we shouldn't care about MS.NET compatibility are the derived public encoding classes; we should override as much methods as we need even if they aren't overridden in MS.NET. (For private encoding classes layout compatibility is not requirement.) For example if I remove !NET_2_0 and NET_2_0 from GetBytes(string) and GetString(byte[], int, int) in UnicodeEncoding significant performance improvement can be achieved in all profiles. Is this deal acceptable? If you have any objections please let me know. Kornél ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
For new byte[1]: The following code for example (and other methods with empy output buffer) has to throw ArgumentException (not IndexOutOfRangeException that bytes[0] throws): Encoding e = Encoding.GetEncoding(1252); e.GetBytes(new char[] {'a'}, 0, 1, new byte[] {}, 0); I didn't previously find the following line in MonoEncoding: if (bytes.Length - byteIndex charCount) throw new ArgumentException (Strings.GetString (Arg_InsufficientSpace), bytes); This will eliminate IndexOutOfRangeException so no new byte[1] is required but actually I'm not sure whether it's correct/safe to assume that GetByteCount returns charCount. (for DBCS it can be more, when using no fallback character (ignoring invalid bytes) it can be less as well) As MonoEncoding is the base encoding class of I18N it may be better to move this check to a higher level and use new byte[1] in MonoEncoding and let GetBytesImpl check for buffer size. For the other things I haven't got enough time now, but I'm happy that we actually have very similar opinion and actually I like to discuss things with you.:) Kornél - Original Message - From: Atsushi Eno [EMAIL PROTECTED] To: Kornél Pál [EMAIL PROTECTED] Cc: mono-devel-list@lists.ximian.com Sent: Wednesday, April 12, 2006 2:04 PM Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers Hi, Here's the real answer. I might not be fully understanding you, but if you are saying that your current patch as is should be applied, then it's no-go (due to the big difference in ASCII and Unicode as you showed us). Note that I'm not saying that performance is always higher-rated matter than compatibility (I'm actually rather anti-pro-optimization dude than others). If there is a way to achieves this compatibility and does not harm performance, it'd be awesome. I'm just not for *extermism*. The reason you were once convinced was not because the evidences are numbers but because the differences are significant. (Hey, there is no doubt that I love your detailed analysis BTW :-) I agree with you on that we had better feel free to override virtual stuff that does not result in MissingMethodException (but it might be only myself). For individual changes other than that performance loss, there are certain goodness in your patches. But for some I'm not convinced (such as giving new byte [1]) because you really don't provide evident NUnit tests. If you don't write any, I will create ones for some changes that I am convinced. But as I've written in the first reply, the difference is so minor that it is low priority for me. BTW thanks for the decent tester code. It conceived me that there are still some optimizible things. Atsushi Eno Kornél Pál wrote: Hi, I've done some tests: New 1.1: UnicodeEncoding: 6750 ASCIIEncoding: 18609 UTF8Encoding: 9922 CP932: 14641 New 2.0: UnicodeEncoding: 13594 ASCIIEncoding: 19562 UTF8Encoding: 16625 CP932: 38906 Old 1.1: UnicodeEncoding: 6906 ASCIIEncoding: 18859 UTF8Encoding: 10062 CP932: 21719 Old 2.0: UnicodeEncoding: 6750 ASCIIEncoding: 7297 UTF8Encoding: 16719 CP932: 45469 I have the following conclusion: UnicodeEncoding in 2.0 is slower because GetBytes(string) is not overridden. But performance is improved in 1.1 because the overridden implementation optimized for UnicodeEncoding. In ASCIIEncoding you can see the drawback of doing optimizations in Encoding class because the current code is only faster on 2.0. Using the new code 1.1 didn't change because not using unsafe code. There is no change in UTF8Encoding (or little but improvement is minimal). CP932 is faster because optimization is done in MonoEncoding. As a conclusion I think that Encoding should be MS.NET compatible because it's more likely to be used by users. And no improvement can be done in profile 1.1 because there are no unsafe methods so there is no use to sacrifice compatibility for performance. I think that the best solution for encoding optimization is to use a single unsafe implementation (for each funtionality; GetBytes, GetChars, GetByteCount, GetCharCount) and other methods (string, char[], byte[]) are calling this single implementation. This makes the code more maintainable as well. This is what I've done in UnicodeEncoding. And I think the point where we shouldn't care about MS.NET compatibility are the derived public encoding classes; we should override as much methods as we need even if they aren't overridden in MS.NET. (For private encoding classes layout compatibility is not requirement.) For example if I remove !NET_2_0 and NET_2_0 from GetBytes(string) and GetString(byte[], int, int) in UnicodeEncoding significant performance improvement can be achieved in all profiles. Is this deal acceptable? If you have any objections please let me know. Kornél ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
Hi, This is the revised version of my original patch. Please approve this one and the other things can be modified (if needed) later. Kornél UnicodeEncoding.diff Description: Binary data ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
Hi, I had some time and looked at all the encoding classes in I18N and in System.Text. byte* and char* is only used in UnicodeEncoding and GetByteCount and GetBytes in I18N. This means that having the #if NET_2_0 codes that you don't want to remove will cause performance loss on profile 2.0 in System.Text while will not improve performance in profile 1.1 as no such optimization is done. The solution is to use arrays in Encoding that improves simple, old fashioned encoding classes but override these methods to use pointers in classes that implement their core functionality using unsafe code. Encodings in System.Text (except UnicodeEncoding) use arrays and I think custom encodings created by users are array based as well so it results in better performance if we use arrays in Encoding. If custom encodings are using unsafe code they will have to override other methods because of MS.NET anyway to get the performance improvement. By overriding GetByteCount (string) and GetBytes (string) in MonoEncoding performance improvement on unsafe code will be preserved in addition it will be available in all profiles. MonoEncoding was already good so I just added these two methods and added the following code to GetBytes methods: int byteCount = bytes.Length - byteIndex; if (bytes.Length == 0) bytes = new byte [1]; Some check is required because bytes[0] will fail for zero-size arrays. bytes.Length == byteIndex could avoid this (but was present in only one of the methods) but this would prevent ArgumentException being thrown for too small output buffers. Creating a small array is little overhead and an exception will probably be thrown because charCount is non-zero. Attached an improved patch. Please review the patch. Kornél - Original Message - From: Kornél Pál [EMAIL PROTECTED] To: Atsushi Eno [EMAIL PROTECTED] Cc: mono-devel-list@lists.ximian.com Sent: Monday, April 10, 2006 3:53 PM Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers Now I haven't got enough time for a detailed reply (I'll later do that) but I have the following general opinion: For a public abstract base class like Encoding I think it's important to use the same wrappers as MS.NET does because people probably assume this. And their code will behave differently on Mono otherwise if the only override some methods. I18N encodings however are totally internal so nobody can override them (if obtained using GetEncoding) so they can do whatever we want. I saw that you use pointer based conversion in MonoEncoding that is good and fast and should not be reverted and is not affected by the bad performance of Encoding. I think Encoding should be MS.NET compatible but MonoEncoding can override any methods of Encoding that are inefficient. So I think we should retard Encoding to provide compatibility and update (or leave if it's already good enough) MonoEncoding for better performance. Note that Encoding in MS.NET 2.0 is just as inefficient (if not more inefficient) as in 1.x but they created their own EncodingNLS (I guess this is that class) to address this problem. We could do the same using MonoEncoding. Type type = Encoding.GetEncoding(1252).GetType(); while (type != null) { Console.WriteLine(type.FullName); type = type.BaseType; } System.Text.SBCSCodePageEncoding System.Text.BaseCodePageEncoding System.Text.EncodingNLS System.Text.Encoding System.Object Kornél - Original Message - From: Atsushi Eno [EMAIL PROTECTED] Cc: mono-devel-list@lists.ximian.com Sent: Monday, April 10, 2006 2:48 PM Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers Hi, Kornél Pál wrote: Hi, Now I understood why is UnicodeEncodig.GetBytes(string) overridden in MS.NET 1.x but not in MS.NET 2.0. Encoding of MS.NET uses char[] to convert strings in all versions and the call an overload that takes char[] in GetBytes(string) as well. (This is a difference compared to Mono as it uses char* in 2.0.) And I think MS realized that the should make GetBytes(string) a higher level wrapper just like the other ones and call GetBytes(string, int, int, byte[], int) like the overridden method in UnicodeEncoding does. But then they realized that this would break compatibility with MS.NET 1.x so they dropped the modification done to Encodig.GetBytes(string) but forgot to put back the override in UnicodeEncoding so 2.0 UnicodeEncodig.GetBytes(string) is actually less efficient than in 1.0. I updated the patch to call the right method in UnicodeEncodig.GetBytes(string). Also note that Encoding of Mono is using the new unsafe methods for GetBytes that takes string but MS.NET is not doing this optimization and is using char[] instead that is more efficient when the new unsafe methods are not overridden as they convert pointers back to arrays by default. In addition calling the same methods improves compatibility. Umm, I don't
Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
Hi, I'm not interested in how your patch accomplishes MS.NET compatibility. My question is simple: is your patch *good* for Mono? using System; using System.Diagnostics; using System.IO; using System.Text; public class Test { public static void Main (string [] args) { int loop = args.Length 1 ? int.Parse (args [1]) : 100; string s = File.OpenText (args [0]).ReadToEnd (); Encoding e = Encoding.Unicode; Stopwatch sw = Stopwatch.StartNew (); for (int i = 0; i loop; i++) e.GetBytes (s); sw.Stop (); Console.WriteLine (sw.ElapsedMilliseconds); } } Before your patch: mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml 5038 After the patch: $ rundev2 mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml 10175 Atsushi Eno Kornél Pál wrote: Hi, I had some time and looked at all the encoding classes in I18N and in System.Text. byte* and char* is only used in UnicodeEncoding and GetByteCount and GetBytes in I18N. This means that having the #if NET_2_0 codes that you don't want to remove will cause performance loss on profile 2.0 in System.Text while will not improve performance in profile 1.1 as no such optimization is done. The solution is to use arrays in Encoding that improves simple, old fashioned encoding classes but override these methods to use pointers in classes that implement their core functionality using unsafe code. Encodings in System.Text (except UnicodeEncoding) use arrays and I think custom encodings created by users are array based as well so it results in better performance if we use arrays in Encoding. If custom encodings are using unsafe code they will have to override other methods because of MS.NET anyway to get the performance improvement. By overriding GetByteCount (string) and GetBytes (string) in MonoEncoding performance improvement on unsafe code will be preserved in addition it will be available in all profiles. MonoEncoding was already good so I just added these two methods and added the following code to GetBytes methods: int byteCount = bytes.Length - byteIndex; if (bytes.Length == 0) bytes = new byte [1]; Some check is required because bytes[0] will fail for zero-size arrays. bytes.Length == byteIndex could avoid this (but was present in only one of the methods) but this would prevent ArgumentException being thrown for too small output buffers. Creating a small array is little overhead and an exception will probably be thrown because charCount is non-zero. Attached an improved patch. Please review the patch. Kornél ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list