Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-13 Thread Atsushi Eno

Hi,

Kornél Pál wrote:

For new byte[1]:

The following code for example (and other methods with empy output 
buffer) has to throw ArgumentException (not IndexOutOfRangeException 
that bytes[0] throws):


Encoding e = Encoding.GetEncoding(1252);
e.GetBytes(new char[] {'a'}, 0, 1, new byte[] {}, 0);


Does this really differentiate results? I got ArgumentException
from (unpatched) svn.

(BTW that's why we need NUnit tests as well.)

Atsushi Eno
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-13 Thread Atsushi Eno

Hello,

This is the revised version of my original patch. Please approve this 
one and the other things can be modified (if needed) later.


Comments inline (btw as it is often asked, text/plain is better than
application/octet-stream for patch attachments for review).


@@ -146,8 +146,7 @@
return GetBytesInternal (charPtr + charIndex, 
charCount, bytePtr + byteIndex, byteCount);
}
 
-#if !NET_2_0

-   public unsafe override byte [] GetBytes (String s)
+   public override byte [] GetBytes (String s)
{
if (s == null)
throw new ArgumentNullException (s);
@@ -155,14 +154,10 @@
int byteCount = GetByteCount (s);
byte [] bytes = new byte [byteCount];
 
-		if (byteCount != 0)

-   fixed (char* charPtr = s)
-   fixed (byte* bytePtr = bytes)
-   GetBytesInternal (charPtr, s.Length, 
bytePtr, byteCount);
+   GetBytes (s, 0, s.Length, bytes, 0);
 
 		return bytes;

}
-#endif
 
 	public unsafe override int GetBytes (String s, int charIndex, int charCount,


byte [] bytes, int byteIndex)


Why do you *dare* make API mismatch here? Don't remove #if NET_2_0.
It does not improve anything.


@@ -300,6 +299,30 @@
}
 #endif
 
+	// Decode a buffer of bytes into a string.

+   public unsafe override String GetString (byte [] bytes, int index, int 
count)
+   {
+   if (bytes == null)
+   throw new ArgumentNullException (bytes);
+   if (index  0 || index  bytes.Length)
+   throw new ArgumentOutOfRangeException (index, 
_(ArgRange_Array));
+   if (count  0 || count  (bytes.Length - index))
+   throw new ArgumentOutOfRangeException (count, 
_(ArgRange_Array));
+
+   if (count == 0)
+   return string.Empty;
+
+   // GetCharCountInternal
+   int charCount = count / 2;
+   string s = string.InternalAllocateStr (charCount);
+
+   fixed (byte* bytePtr = bytes)
+   fixed (char* charPtr = s)
+   GetCharsInternal (bytePtr + index, count, 
charPtr, charCount);
+
+   return s;
+   }
+
private unsafe int GetCharsInternal (byte* bytes, int byteCount,

char* chars, int charCount)
{


So, this is the part that you said it is (will be) overriden only in
Mono (for 1.1), right? I think this 1/2 memory consumption is awesome :)

Other than the first point it looks good. Thanks Kornél.

Atsushi Eno
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-12 Thread Kornél Pál

Hi,

Numbers are things that can convince me.:)

Now I have three questions:
- Are there parts of the patch that are OK to commit?
- Do we care about class signature (what methods are overriden)?
- Do we care about the implementation of virtual methods (what methods do 
they call)?


I can follow any guidelines - altough I don't belive in performance above 
everything else - but I would like to know them otherwise I cannot follow 
them.


Kornél

- Original Message - 
From: Atsushi Eno [EMAIL PROTECTED]

To: Kornél Pál [EMAIL PROTECTED]
Cc: mono-devel-list@lists.ximian.com
Sent: Tuesday, April 11, 2006 6:56 PM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 
andmodifysome Encoding wrappers




Hi,

I'm not interested in how your patch accomplishes MS.NET compatibility.
My question is simple: is your patch *good* for Mono?

using System;
using System.Diagnostics;
using System.IO;
using System.Text;

public class Test
{
public static void Main (string [] args)
{
int loop = args.Length  1 ? int.Parse (args [1]) : 100;
string s = File.OpenText (args [0]).ReadToEnd ();
Encoding e = Encoding.Unicode;
Stopwatch sw = Stopwatch.StartNew ();
for (int i = 0; i  loop; i++)
e.GetBytes (s);
sw.Stop ();
Console.WriteLine (sw.ElapsedMilliseconds);
}
}

Before your patch:
mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml
5038

After the patch:
$ rundev2 mono ./unicode.exe 
../../svn/mono/web/web/masterinfos/System.Web.xml

10175

Atsushi Eno

Kornél Pál wrote:

Hi,

I had some time and looked at all the encoding classes in I18N and in 
System.Text.


byte* and char* is only used in UnicodeEncoding and GetByteCount and 
GetBytes in I18N.


This means that having the #if NET_2_0 codes that you don't want to 
remove will cause performance loss on profile 2.0 in System.Text while 
will not improve performance in profile 1.1 as no such optimization is 
done.


The solution is to use arrays in Encoding that improves simple, old 
fashioned encoding classes but override these methods to use pointers in 
classes that implement their core functionality using unsafe code.


Encodings in System.Text (except UnicodeEncoding) use arrays and I think 
custom encodings created by users are array based as well so it results 
in better performance if we use arrays in Encoding. If custom encodings 
are using unsafe code they will have to override other methods because of 
MS.NET anyway to get the performance improvement.


By overriding GetByteCount (string) and GetBytes (string) in MonoEncoding 
performance improvement on unsafe code will be preserved in addition it 
will be available in all profiles.


MonoEncoding was already good so I just added these two methods and added 
the following code to GetBytes methods:


int byteCount = bytes.Length - byteIndex;
if (bytes.Length == 0)
bytes = new byte [1];

Some check is required because bytes[0] will fail for zero-size arrays. 
bytes.Length == byteIndex could avoid this (but was present in only one 
of the methods) but this would prevent ArgumentException being thrown for 
too small output buffers. Creating a small array is little overhead and 
an exception will probably be thrown because charCount is non-zero.


Attached an improved patch. Please review the patch.

Kornél 


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-12 Thread Kornél Pál

Hi,

I've done some tests:
New 1.1:
UnicodeEncoding: 6750
ASCIIEncoding: 18609
UTF8Encoding: 9922
CP932: 14641

New 2.0:
UnicodeEncoding: 13594
ASCIIEncoding: 19562
UTF8Encoding: 16625
CP932: 38906

Old 1.1:
UnicodeEncoding: 6906
ASCIIEncoding: 18859
UTF8Encoding: 10062
CP932: 21719

Old 2.0:
UnicodeEncoding: 6750
ASCIIEncoding: 7297
UTF8Encoding: 16719
CP932: 45469

I have the following conclusion:

UnicodeEncoding in 2.0 is slower because GetBytes(string) is not overridden. 
But performance is improved in 1.1 because the overridden implementation 
optimized for UnicodeEncoding.


In ASCIIEncoding you can see the drawback of doing optimizations in Encoding 
class because the current code is only faster on 2.0. Using the new code 1.1 
didn't change because not using unsafe code.


There is no change in UTF8Encoding (or little but improvement is minimal).

CP932 is faster because optimization is done in MonoEncoding.

As a conclusion I think that Encoding should be MS.NET compatible because 
it's more likely to be used by users. And no improvement can be done in 
profile 1.1 because there are no unsafe methods so there is no use to 
sacrifice compatibility for performance.


I think that the best solution for encoding optimization is to use a single 
unsafe implementation (for each funtionality; GetBytes, GetChars, 
GetByteCount, GetCharCount) and other methods (string, char[], byte[]) are 
calling this single implementation. This makes the code more maintainable as 
well. This is what I've done in UnicodeEncoding.


And I think the point where we shouldn't care about MS.NET compatibility are 
the derived public encoding classes; we should override as much methods as 
we need even if they aren't overridden in MS.NET. (For private encoding 
classes layout compatibility is not requirement.)


For example if I remove !NET_2_0 and NET_2_0 from GetBytes(string) and 
GetString(byte[], int, int) in UnicodeEncoding significant performance 
improvement can be achieved in all profiles.


Is this deal acceptable? If you have any objections please let me know.

Kornél

- Original Message - 
From: Kornél Pál [EMAIL PROTECTED]

To: Atsushi Eno [EMAIL PROTECTED]
Cc: mono-devel-list@lists.ximian.com
Sent: Wednesday, April 12, 2006 12:35 AM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 
andmodifysome Encoding wrappers




Hi,

Numbers are things that can convince me.:)

Now I have three questions:
- Are there parts of the patch that are OK to commit?
- Do we care about class signature (what methods are overriden)?
- Do we care about the implementation of virtual methods (what methods do 
they call)?


I can follow any guidelines - altough I don't belive in performance above 
everything else - but I would like to know them otherwise I cannot follow 
them.


Kornél

- Original Message - 
From: Atsushi Eno [EMAIL PROTECTED]

To: Kornél Pál [EMAIL PROTECTED]
Cc: mono-devel-list@lists.ximian.com
Sent: Tuesday, April 11, 2006 6:56 PM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 
andmodifysome Encoding wrappers




Hi,

I'm not interested in how your patch accomplishes MS.NET compatibility.
My question is simple: is your patch *good* for Mono?

using System;
using System.Diagnostics;
using System.IO;
using System.Text;

public class Test
{
public static void Main (string [] args)
{
int loop = args.Length  1 ? int.Parse (args [1]) : 100;
string s = File.OpenText (args [0]).ReadToEnd ();
Encoding e = Encoding.Unicode;
Stopwatch sw = Stopwatch.StartNew ();
for (int i = 0; i  loop; i++)
e.GetBytes (s);
sw.Stop ();
Console.WriteLine (sw.ElapsedMilliseconds);
}
}

Before your patch:
mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml
5038

After the patch:
$ rundev2 mono ./unicode.exe 
../../svn/mono/web/web/masterinfos/System.Web.xml

10175

Atsushi Eno

Kornél Pál wrote:

Hi,

I had some time and looked at all the encoding classes in I18N and in 
System.Text.


byte* and char* is only used in UnicodeEncoding and GetByteCount and 
GetBytes in I18N.


This means that having the #if NET_2_0 codes that you don't want to 
remove will cause performance loss on profile 2.0 in System.Text while 
will not improve performance in profile 1.1 as no such optimization is 
done.


The solution is to use arrays in Encoding that improves simple, old 
fashioned encoding classes but override these methods to use pointers in 
classes that implement their core functionality using unsafe code.


Encodings in System.Text (except UnicodeEncoding) use arrays and I think 
custom encodings created by users are array based as well so it results 
in better performance if we use arrays in Encoding. If custom encodings 
are using unsafe code they will have to override other methods because 
of MS.NET anyway to get

Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-12 Thread Atsushi Eno

Just a small doubt: how could you run your test that uses Stopwatch
under 1.x profile?

Atsushi Eno

Kornél Pál wrote:

Hi,

I've done some tests:
New 1.1:
UnicodeEncoding: 6750
ASCIIEncoding: 18609
UTF8Encoding: 9922
CP932: 14641

New 2.0:
UnicodeEncoding: 13594
ASCIIEncoding: 19562
UTF8Encoding: 16625
CP932: 38906

Old 1.1:
UnicodeEncoding: 6906
ASCIIEncoding: 18859
UTF8Encoding: 10062
CP932: 21719

Old 2.0:
UnicodeEncoding: 6750
ASCIIEncoding: 7297
UTF8Encoding: 16719
CP932: 45469



using System;
using System.Diagnostics;
using System.IO;
using System.Text;

namespace Test
{
public class Test
{
public static int loop;
public static string s;

public static void Main(string[] args)
{
loop = args.Length  1 ? int.Parse(args[1]) : 100;
s = File.OpenText(args[0]).ReadToEnd();

Do(Encoding.Unicode);
Do(Encoding.ASCII);
Do(Encoding.UTF8);
Do(Encoding.GetEncoding(932));
}

public static void Do(Encoding e)
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i  loop; i++)
e.GetBytes(s);
sw.Stop();
Console.WriteLine(e.GetType().Name + :  + 
sw.ElapsedMilliseconds.ToString());

}
}
}


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-12 Thread Kornél Pál
I included Stopwatch.cs in the exe. (Removed MonoTODO and compiled.) And 
used the same executable on 2.0 as well with a config file. (This was a lazy 
solution but is fine.:)


Kornél

- Original Message - 
From: Atsushi Eno [EMAIL PROTECTED]

To: Kornél Pál [EMAIL PROTECTED]
Cc: mono-devel-list@lists.ximian.com
Sent: Wednesday, April 12, 2006 1:13 PM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 
andmodifysome Encoding wrappers




Just a small doubt: how could you run your test that uses Stopwatch
under 1.x profile?

Atsushi Eno

Kornél Pál wrote:

Hi,

I've done some tests:
New 1.1:
UnicodeEncoding: 6750
ASCIIEncoding: 18609
UTF8Encoding: 9922
CP932: 14641

New 2.0:
UnicodeEncoding: 13594
ASCIIEncoding: 19562
UTF8Encoding: 16625
CP932: 38906

Old 1.1:
UnicodeEncoding: 6906
ASCIIEncoding: 18859
UTF8Encoding: 10062
CP932: 21719

Old 2.0:
UnicodeEncoding: 6750
ASCIIEncoding: 7297
UTF8Encoding: 16719
CP932: 45469



using System;
using System.Diagnostics;
using System.IO;
using System.Text;

namespace Test
{
public class Test
{
public static int loop;
public static string s;

public static void Main(string[] args)
{
loop = args.Length  1 ? int.Parse(args[1]) : 100;
s = File.OpenText(args[0]).ReadToEnd();

Do(Encoding.Unicode);
Do(Encoding.ASCII);
Do(Encoding.UTF8);
Do(Encoding.GetEncoding(932));
}

public static void Do(Encoding e)
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i  loop; i++)
e.GetBytes(s);
sw.Stop();
Console.WriteLine(e.GetType().Name + :  + 
sw.ElapsedMilliseconds.ToString());

}
}
}




___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-12 Thread Atsushi Eno

Hi,

Here's the real answer.

I might not be fully understanding you, but if you are saying that
your current patch as is should be applied, then it's no-go (due
to the big difference in ASCII and Unicode as you showed us).

Note that I'm not saying that performance is always higher-rated
matter than compatibility (I'm actually rather anti-pro-optimization
dude than others). If there is a way to achieves this compatibility
and does not harm performance, it'd be awesome. I'm just not for
*extermism*. The reason you were once convinced was not because the
evidences are numbers but because the differences are significant.

(Hey, there is no doubt that I love your detailed analysis BTW :-)

I agree with you on that we had better feel free to override virtual
stuff that does not result in MissingMethodException (but it might
be only myself).

For individual changes other than that performance loss, there are
certain goodness in your patches. But for some I'm not convinced
(such as giving new byte [1]) because you really don't provide
evident NUnit tests.

If you don't write any, I will create ones for some changes that I am
convinced. But as I've written in the first reply, the difference is
so minor that it is low priority for me.

BTW thanks for the decent tester code. It conceived me that there are
still some optimizible things.

Atsushi Eno

Kornél Pál wrote:

Hi,

I've done some tests:
New 1.1:
UnicodeEncoding: 6750
ASCIIEncoding: 18609
UTF8Encoding: 9922
CP932: 14641

New 2.0:
UnicodeEncoding: 13594
ASCIIEncoding: 19562
UTF8Encoding: 16625
CP932: 38906

Old 1.1:
UnicodeEncoding: 6906
ASCIIEncoding: 18859
UTF8Encoding: 10062
CP932: 21719

Old 2.0:
UnicodeEncoding: 6750
ASCIIEncoding: 7297
UTF8Encoding: 16719
CP932: 45469

I have the following conclusion:

UnicodeEncoding in 2.0 is slower because GetBytes(string) is not 
overridden. But performance is improved in 1.1 because the overridden 
implementation optimized for UnicodeEncoding.


In ASCIIEncoding you can see the drawback of doing optimizations in 
Encoding class because the current code is only faster on 2.0. Using the 
new code 1.1 didn't change because not using unsafe code.


There is no change in UTF8Encoding (or little but improvement is minimal).

CP932 is faster because optimization is done in MonoEncoding.

As a conclusion I think that Encoding should be MS.NET compatible 
because it's more likely to be used by users. And no improvement can be 
done in profile 1.1 because there are no unsafe methods so there is no 
use to sacrifice compatibility for performance.


I think that the best solution for encoding optimization is to use a 
single unsafe implementation (for each funtionality; GetBytes, GetChars, 
GetByteCount, GetCharCount) and other methods (string, char[], byte[]) 
are calling this single implementation. This makes the code more 
maintainable as well. This is what I've done in UnicodeEncoding.


And I think the point where we shouldn't care about MS.NET compatibility 
are the derived public encoding classes; we should override as much 
methods as we need even if they aren't overridden in MS.NET. (For 
private encoding classes layout compatibility is not requirement.)


For example if I remove !NET_2_0 and NET_2_0 from GetBytes(string) and 
GetString(byte[], int, int) in UnicodeEncoding significant performance 
improvement can be achieved in all profiles.


Is this deal acceptable? If you have any objections please let me know.

Kornél

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-12 Thread Kornél Pál

For new byte[1]:

The following code for example (and other methods with empy output buffer) 
has to throw ArgumentException (not IndexOutOfRangeException that bytes[0] 
throws):


Encoding e = Encoding.GetEncoding(1252);
e.GetBytes(new char[] {'a'}, 0, 1, new byte[] {}, 0);

I didn't previously find the following line in MonoEncoding:
if (bytes.Length - byteIndex  charCount)
throw new ArgumentException (Strings.GetString (Arg_InsufficientSpace), 
bytes);


This will eliminate IndexOutOfRangeException so no new byte[1] is required 
but actually I'm not sure whether it's correct/safe to assume that 
GetByteCount returns charCount. (for DBCS it can be more, when using no 
fallback character (ignoring invalid bytes) it can be less as well) As 
MonoEncoding is the base encoding class of I18N it may be better to move 
this check to a higher level and use new byte[1] in MonoEncoding and let 
GetBytesImpl check for buffer size.


For the other things I haven't got enough time now, but I'm happy that we 
actually have very similar opinion and actually I like to discuss things 
with you.:)


Kornél

- Original Message - 
From: Atsushi Eno [EMAIL PROTECTED]

To: Kornél Pál [EMAIL PROTECTED]
Cc: mono-devel-list@lists.ximian.com
Sent: Wednesday, April 12, 2006 2:04 PM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 
andmodifysome Encoding wrappers




Hi,

Here's the real answer.

I might not be fully understanding you, but if you are saying that
your current patch as is should be applied, then it's no-go (due
to the big difference in ASCII and Unicode as you showed us).

Note that I'm not saying that performance is always higher-rated
matter than compatibility (I'm actually rather anti-pro-optimization
dude than others). If there is a way to achieves this compatibility
and does not harm performance, it'd be awesome. I'm just not for
*extermism*. The reason you were once convinced was not because the
evidences are numbers but because the differences are significant.

(Hey, there is no doubt that I love your detailed analysis BTW :-)

I agree with you on that we had better feel free to override virtual
stuff that does not result in MissingMethodException (but it might
be only myself).

For individual changes other than that performance loss, there are
certain goodness in your patches. But for some I'm not convinced
(such as giving new byte [1]) because you really don't provide
evident NUnit tests.

If you don't write any, I will create ones for some changes that I am
convinced. But as I've written in the first reply, the difference is
so minor that it is low priority for me.

BTW thanks for the decent tester code. It conceived me that there are
still some optimizible things.

Atsushi Eno

Kornél Pál wrote:

Hi,

I've done some tests:
New 1.1:
UnicodeEncoding: 6750
ASCIIEncoding: 18609
UTF8Encoding: 9922
CP932: 14641

New 2.0:
UnicodeEncoding: 13594
ASCIIEncoding: 19562
UTF8Encoding: 16625
CP932: 38906

Old 1.1:
UnicodeEncoding: 6906
ASCIIEncoding: 18859
UTF8Encoding: 10062
CP932: 21719

Old 2.0:
UnicodeEncoding: 6750
ASCIIEncoding: 7297
UTF8Encoding: 16719
CP932: 45469

I have the following conclusion:

UnicodeEncoding in 2.0 is slower because GetBytes(string) is not 
overridden. But performance is improved in 1.1 because the overridden 
implementation optimized for UnicodeEncoding.


In ASCIIEncoding you can see the drawback of doing optimizations in 
Encoding class because the current code is only faster on 2.0. Using the 
new code 1.1 didn't change because not using unsafe code.


There is no change in UTF8Encoding (or little but improvement is 
minimal).


CP932 is faster because optimization is done in MonoEncoding.

As a conclusion I think that Encoding should be MS.NET compatible because 
it's more likely to be used by users. And no improvement can be done in 
profile 1.1 because there are no unsafe methods so there is no use to 
sacrifice compatibility for performance.


I think that the best solution for encoding optimization is to use a 
single unsafe implementation (for each funtionality; GetBytes, GetChars, 
GetByteCount, GetCharCount) and other methods (string, char[], byte[]) 
are calling this single implementation. This makes the code more 
maintainable as well. This is what I've done in UnicodeEncoding.


And I think the point where we shouldn't care about MS.NET compatibility 
are the derived public encoding classes; we should override as much 
methods as we need even if they aren't overridden in MS.NET. (For private 
encoding classes layout compatibility is not requirement.)


For example if I remove !NET_2_0 and NET_2_0 from GetBytes(string) and 
GetString(byte[], int, int) in UnicodeEncoding significant performance 
improvement can be achieved in all profiles.


Is this deal acceptable? If you have any objections please let me know.

Kornél 


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http

Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-12 Thread Kornél Pál

Hi,

This is the revised version of my original patch. Please approve this one 
and the other things can be modified (if needed) later.


Kornél 


UnicodeEncoding.diff
Description: Binary data
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-11 Thread Kornél Pál

Hi,

I had some time and looked at all the encoding classes in I18N and in 
System.Text.


byte* and char* is only used in UnicodeEncoding and GetByteCount and 
GetBytes in I18N.


This means that having the #if NET_2_0 codes that you don't want to remove 
will cause performance loss on profile 2.0 in System.Text while will not 
improve performance in profile 1.1 as no such optimization is done.


The solution is to use arrays in Encoding that improves simple, old 
fashioned encoding classes but override these methods to use pointers in 
classes that implement their core functionality using unsafe code.


Encodings in System.Text (except UnicodeEncoding) use arrays and I think 
custom encodings created by users are array based as well so it results in 
better performance if we use arrays in Encoding. If custom encodings are 
using unsafe code they will have to override other methods because of MS.NET 
anyway to get the performance improvement.


By overriding GetByteCount (string) and GetBytes (string) in MonoEncoding 
performance improvement on unsafe code will be preserved in addition it will 
be available in all profiles.


MonoEncoding was already good so I just added these two methods and added 
the following code to GetBytes methods:


int byteCount = bytes.Length - byteIndex;
if (bytes.Length == 0)
bytes = new byte [1];

Some check is required because bytes[0] will fail for zero-size arrays. 
bytes.Length == byteIndex could avoid this (but was present in only one of 
the methods) but this would prevent ArgumentException being thrown for too 
small output buffers. Creating a small array is little overhead and an 
exception will probably be thrown because charCount is non-zero.


Attached an improved patch. Please review the patch.

Kornél

- Original Message - 
From: Kornél Pál [EMAIL PROTECTED]

To: Atsushi Eno [EMAIL PROTECTED]
Cc: mono-devel-list@lists.ximian.com
Sent: Monday, April 10, 2006 3:53 PM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0
andmodifysome Encoding wrappers



Now I haven't got enough time for a detailed reply (I'll later do that)
but I have the following general opinion:

For a public abstract base class like Encoding I think it's important to
use the same wrappers as MS.NET does because people probably assume this.
And their code will behave differently on Mono otherwise if the only
override some methods.

I18N encodings however are totally internal so nobody can override them
(if obtained using GetEncoding) so they can do whatever we want.

I saw that you use pointer based conversion in MonoEncoding that is good
and fast and should not be reverted and is not affected by the bad
performance of Encoding. I think Encoding should be MS.NET compatible but
MonoEncoding can override any methods of Encoding that are inefficient.

So I think we should retard Encoding to provide compatibility and update
(or leave if it's already good enough) MonoEncoding for better
performance.

Note that Encoding in MS.NET 2.0 is just as inefficient (if not more
inefficient) as in 1.x but they created their own EncodingNLS (I guess
this is that class) to address this problem. We could do the same using
MonoEncoding.

Type type = Encoding.GetEncoding(1252).GetType();
while (type != null)
{
Console.WriteLine(type.FullName);
type = type.BaseType;
}

System.Text.SBCSCodePageEncoding
System.Text.BaseCodePageEncoding
System.Text.EncodingNLS
System.Text.Encoding
System.Object

Kornél

- Original Message - 
From: Atsushi Eno [EMAIL PROTECTED]

Cc: mono-devel-list@lists.ximian.com
Sent: Monday, April 10, 2006 2:48 PM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0
andmodifysome Encoding wrappers



Hi,

Kornél Pál wrote:

Hi,

Now I understood why is UnicodeEncodig.GetBytes(string) overridden in
MS.NET 1.x but not in MS.NET 2.0.



Encoding of MS.NET uses char[] to convert strings in all versions and
the call an overload that takes char[] in GetBytes(string) as well.
(This is a difference compared to Mono as it uses char* in 2.0.) And I
think MS realized that the should make GetBytes(string) a higher level
wrapper just like the other ones and call GetBytes(string, int, int,
byte[], int) like the overridden method in UnicodeEncoding does.

But then they realized that this would break compatibility with MS.NET
1.x so they dropped the modification done to Encodig.GetBytes(string)
but forgot to put back the override in UnicodeEncoding so 2.0
UnicodeEncodig.GetBytes(string) is actually less efficient than in 1.0.

I updated the patch to call the right method in
UnicodeEncodig.GetBytes(string).

Also note that Encoding of Mono is using the new unsafe methods for
GetBytes that takes string but MS.NET is not doing this optimization and
is using char[] instead that is more efficient when the new unsafe
methods are not overridden as they convert pointers back to arrays by
default. In addition calling the same methods improves compatibility.


Umm, I don't

Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

2006-04-11 Thread Atsushi Eno

Hi,

I'm not interested in how your patch accomplishes MS.NET compatibility.
My question is simple: is your patch *good* for Mono?

using System;
using System.Diagnostics;
using System.IO;
using System.Text;

public class Test
{
public static void Main (string [] args)
{
int loop = args.Length  1 ? int.Parse (args [1]) : 100;
string s = File.OpenText (args [0]).ReadToEnd ();
Encoding e = Encoding.Unicode;
Stopwatch sw = Stopwatch.StartNew ();
for (int i = 0; i  loop; i++)
e.GetBytes (s);
sw.Stop ();
Console.WriteLine (sw.ElapsedMilliseconds);
}
}

Before your patch:
mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml
5038

After the patch:
$ rundev2 mono ./unicode.exe 
../../svn/mono/web/web/masterinfos/System.Web.xml

10175

Atsushi Eno

Kornél Pál wrote:

Hi,

I had some time and looked at all the encoding classes in I18N and in 
System.Text.


byte* and char* is only used in UnicodeEncoding and GetByteCount and 
GetBytes in I18N.


This means that having the #if NET_2_0 codes that you don't want to 
remove will cause performance loss on profile 2.0 in System.Text while 
will not improve performance in profile 1.1 as no such optimization is 
done.


The solution is to use arrays in Encoding that improves simple, old 
fashioned encoding classes but override these methods to use pointers in 
classes that implement their core functionality using unsafe code.


Encodings in System.Text (except UnicodeEncoding) use arrays and I think 
custom encodings created by users are array based as well so it results 
in better performance if we use arrays in Encoding. If custom encodings 
are using unsafe code they will have to override other methods because 
of MS.NET anyway to get the performance improvement.


By overriding GetByteCount (string) and GetBytes (string) in 
MonoEncoding performance improvement on unsafe code will be preserved in 
addition it will be available in all profiles.


MonoEncoding was already good so I just added these two methods and 
added the following code to GetBytes methods:


int byteCount = bytes.Length - byteIndex;
if (bytes.Length == 0)
bytes = new byte [1];

Some check is required because bytes[0] will fail for zero-size arrays. 
bytes.Length == byteIndex could avoid this (but was present in only 
one of the methods) but this would prevent ArgumentException being 
thrown for too small output buffers. Creating a small array is little 
overhead and an exception will probably be thrown because charCount is 
non-zero.


Attached an improved patch. Please review the patch.

Kornél

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list