Re: [Mono-list] How many strings do I have?

2004-10-11 Thread Paolo Molaro
On 10/09/04 Jonathan Pryor wrote:
 On Fri, 2004-10-08 at 04:39, Jeroen Frijters wrote:
  Jonathan Gilbert wrote:
   Once a string is in the pool, it can NEVER be removed
   (until the AppDomain is disposed).
  
  In Java the string intern pool uses weak references. I don't know about
  .NET, but I'd hope it does the same.
 
 I believe it doesn't.  From [1]:
 
 Interning happens two ways in the CLR.
 ... 
 
 2) It happens automatically, when you load an assembly.  All the
 string literals in the assembly are intern???ed.  This is

I doubt this is true, though I'm lazy and didn't write a test.
What the current C# compiler behaviour imposes is to intern all the strings 
referenced by a method before the method is executed, so AFAIK, this is
a requirement per-method and not per-assembly.
I'm not sure why using weak references would help: the strings must continue
to exist until the appdomain is shut down anyway.

As for the original issue: mono behaves that way because it uses the
unmanaged representation of a string as the key in the hash of interned
strings. This made sense at the time when the code was written (we had 
just the interpreter). The code should be changed to use MonoString* as 
the key since this will make the string intern icalls faster.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better
___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


RE: [Mono-list] How many strings do I have?

2004-10-09 Thread Jonathan Pryor
On Fri, 2004-10-08 at 04:39, Jeroen Frijters wrote:
 Jonathan Gilbert wrote:
  Once a string is in the pool, it can NEVER be removed
  (until the AppDomain is disposed).
 
 In Java the string intern pool uses weak references. I don't know about
 .NET, but I'd hope it does the same.

I believe it doesn't.  From [1]:

Interning happens two ways in the CLR.
... 

2) It happens automatically, when you load an assembly.  All the
string literals in the assembly are interned.  This is
expensive and  in retrospect  may have been a mistake.  In the
future we might consider allowing individual assemblies to
opt-in or opt-out

 - Jon

[1] http://blogs.msdn.com/cbrumme/archive/2003/04/22/51371.aspx

___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


RE: [Mono-list] How many strings do I have?

2004-10-08 Thread Jeroen Frijters
Jonathan Gilbert wrote:
 Once a string is in the pool, it can NEVER be removed
 (until the AppDomain is disposed).

In Java the string intern pool uses weak references. I don't know about
.NET, but I'd hope it does the same.

Regards,
Jeroen
___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


Re: [Mono-list] How many strings do I have?

2004-10-07 Thread Jonathan Gilbert
At 03:49 PM 07/10/2004 -0400, you wrote:
This is funny...

http://weblogs.asp.net/bclteam/archive/2004/09/07/226487.aspx

When I compile it with Visual Studio and run in on Windows [.NET 1.1], I get
this:

Hello
Xello
Not Found

On GNU/Linux and Mono (mcs) I get this:

Hello
Xello
string is Hello

The reason the 'switch' block doesn't work in .NET has to do with the way
'switch' on strings is implemented, as well as the implementation details
of the string intern pool.

'switch' performs about the same with strings as it does with ints. How
does it do this? It has to do with the fact that every separate object on
the system has an object ID, which can be treated as an 'int'. This alone
isn't enough, but coupled with the string intern pool, it is possible to
come up with a definitive object ID that represents a given string.

When the JIT is compiling code, by definition it is required to add every
literal string to the current AppDomain's string intern pool. This is a
very simple process; String.Intern is called for each string, and that
function operates in a simple way: It has a hash table of strings, and if a
string object with exactly the same value is not in the hash table, then it
adds it, otherwise it leaves the existing instance alone. In other words,
within a given run of a program, String.Intern(Hello) should *always*
return the same instance of the string Hello.

Java also has a string intern pool, but the language does not support
'switch' on strings. The reason for this is that it lacks a critical
function that .NET has. The string intern pool should generally be kept
clean; you don't want to go interning every string you get. However, Java's
intern pool has only one function to access it: java.lang.String::intern().
This function, like .NET's System.String::Intern(), has the post-condition
that *some* instance of the string you passed be in the pool; it adds the
string you pass if it isn't there. Once a string is in the pool, it can
NEVER be removed (until the AppDomain is disposed). To permit 'switch' on
strings without polluting the string intern pool, therefore, .NET has an
additional method, System.String::IsInterned(). It never modifies the
intern pool, and returns 'null' if the string is not in the pool. If the
string IS in the pool, however, it returns the interned instance. If the
string being checked is one of the string literals in a 'case' label for
the 'switch' block, then by definition, it will return the same instance as
the one referenced by the 'case' label (since the interpreter is required
to intern all string literals when it loads the assembly).

After this, it is a simple matter of treating the object ID of the interned
strings as integers, and doing an integer 'switch' block.

Now, the implementation of the intern pool, for best performance, is almost
always going to be a hash table. A hash table takes the hash of the key to
determine the slot into which to put the data, but more than one value of
'string' can have the same 'GetHashCode' return value, so the hash table
needs to take collisions into account. Closed hashing certainly does work,
but open hashing is a lot easier to implement. This means that each slot in
the hash table is a list; once you hash the string, you need to check that
it is actually in the list. This is where the difference between the MS.NET
and mono implementations comes into play.

From the output you paste, it looks to me like MS.NET uses a fully-managed
string intern pool. That is, it uses the same interned strings themselves
as the entries in the hash table. When the constant is changed from Hello
to Xello, the 'switch' statement's call to 'IsInterned' hashes 'Hello',
finds a list with 1 entry, but that entry's data is 'Xello' (remember that
the intern pool is using the same constant that was modified by the test
program, which is technically not a valid program). Therefore, .NET can't
find the string in the intern pool, and it returns 'null', and the 'switch'
block fails.

On the other hand, mono's implementation seems to use unmanaged code (or at
least a separate instance of the string). Inside the hash table, the
entries in the list for each slot are NOT the actual interned 'string'
objects. When the lookup is done, the instance in the list found in the
corresponding slot in the hash table still says 'Hello', even though the
interned string object now contains the text 'Xello'. As such, it is able
to find the instance, and the 'switch' block succeeds. The intern pool says
that the instance with that object ID contains the text 'Hello', which is
why 'case Hello:' is reached, even though the value of Hello is no
longer Hello :-)

Does that make sense?

Jonathan Gilbert

___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list


Re: [Mono-list] How many strings do I have?

2004-10-07 Thread Marcus
On the subject of switches on strings, I tested how mcs and the Mono runtime 
performed. Specifically, I tested the code that mcs produces for switches on 
strings versus nested if-statements for a three-way decision. The 
if-statement version was substantally faster (about 3-5 times, as I recall). 
The difference was due to how mcs handles switch statements. Using a 
different C# compiler, there was no difference between if-statements and 
switch under the Mono runtime.


On Thursday 07 October 2004 6:21 pm, Jonathan Gilbert wrote:

 'switch' performs about the same with strings as it does with ints. How
 does it do this? It has to do with the fact that every separate object on
 the system has an object ID, which can be treated as an 'int'. This alone
 isn't enough, but coupled with the string intern pool, it is possible to
 come up with a definitive object ID that represents a given string.

___
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list