Re: [Mono-dev] OS X builds and -DUSE-MUNMAP

2006-12-01 Thread Allan Hsu

On Dec 1, 2006, at 2:55 PM, Miguel de Icaza wrote:

 Hello,

 Nobody has objected to turning on -DUSE_MUNMAP under OS X since I
 posted this email... Can somebody give me the go-ahead to commit the
 configure.in change needed to switch over?

 Go ahead.

Committed.

--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] OS X builds and -DUSE-MUNMAP

2006-11-22 Thread Allan Hsu
Does anybody have a problem with using -DUSE_MUNMAP in the CPPFLAGS  
setting in configure.in under OS X? I talked to Paolo about it at the  
developer's meeting last month and he had said that the only reason  
it wasn't turned on was because nobody had verified that it worked.  
We (imeem) have been shipping builds of Mono with this turned on for  
a few months now, so as far as I know, it does work.

-Allan
--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] [PATCH] OS X MACHINE_THREAD_STATE patch for newer 10.4u SDK

2006-11-12 Thread Allan Hsu
It should be backwards compatible with older versions of the SDK.

-Allan

--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779

On Nov 12, 2006, at 4:22 AM, Miguel de Icaza wrote:

 Will this patch still allow people with older versions of gcc/Xcode to
 work?  Or do we need to do some auto-detection?


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] [PATCH] OS X MACHINE_THREAD_STATE patch for newer 10.4u SDK

2006-10-25 Thread Allan Hsu
This patch fixes problems with building mono for i386 using the new
10.4u SDK that ships with Xcode 2.4. Current SVN builds for i386 under
the new SDK, but does not run properly due to changes in
MACHINE_THREAD_STATE. It looks like the breakage in source compatibility
is intentional on Apple's part and future headers will be similarly
broken.

I will be submitting the same patch to libgc upstream.

-Allan
-- 
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779
Index: libgc/darwin_stop_world.c
===
--- libgc/darwin_stop_world.c   (revision 66738)
+++ libgc/darwin_stop_world.c   (working copy)
@@ -75,12 +75,14 @@
   ptr_t lo, hi;
 #if defined(POWERPC)
   ppc_thread_state_t state;
+  mach_msg_type_number_t thread_state_count = PPC_THREAD_STATE_COUNT;
 #elif defined(I386)
   i386_thread_state_t state;
+  mach_msg_type_number_t thread_state_count = i386_THREAD_STATE_COUNT;
 #else
 # error FIXME for non-x86 || ppc architectures
+  mach_msg_type_number_t thread_state_count = MACHINE_THREAD_STATE_COUNT;
 #endif
-  mach_msg_type_number_t thread_state_count = MACHINE_THREAD_STATE_COUNT;
   
   me = pthread_self();
   if (!GC_thr_initialized) GC_thr_init();
@@ -94,7 +96,7 @@
/* Get the thread state (registers, etc) */
r = thread_get_state(
 p-stop_info.mach_thread,
-MACHINE_THREAD_STATE,
+GC_MACH_THREAD_STATE_FLAVOR,
 (natural_t*)state,
 thread_state_count);
if(r != KERN_SUCCESS) ABORT(thread_get_state failed);
@@ -193,7 +195,7 @@
ppc_thread_state64_t info;
 #  endif
mach_msg_type_number_t outCount = THREAD_STATE_MAX;
-   r = thread_get_state(thread, MACHINE_THREAD_STATE,
+   r = thread_get_state(thread, GC_MACH_THREAD_STATE_FLAVOR,
 (natural_t *)info, outCount);
if(r != KERN_SUCCESS) continue;
 
@@ -236,7 +238,7 @@
WARN(This is completely untested and likely will not work\n, 0);
i386_thread_state_t info;
mach_msg_type_number_t outCount = THREAD_STATE_MAX;
-   r = thread_get_state(thread, MACHINE_THREAD_STATE,
+   r = thread_get_state(thread, GC_MACH_THREAD_STATE_FLAVOR,
 (natural_t *)info, outCount);
if(r != KERN_SUCCESS) continue;
 
Index: libgc/include/private/gc_priv.h
===
--- libgc/include/private/gc_priv.h (revision 66738)
+++ libgc/include/private/gc_priv.h (working copy)
@@ -366,6 +366,16 @@
 #   define BZERO(x,n) bzero((char *)(x),(int)(n))
 # endif
 
+#if defined(DARWIN)
+#  if defined(POWERPC)
+#  define GC_MACH_THREAD_STATE_FLAVOR PPC_THREAD_STATE
+#  elif defined(I386)
+#  define GC_MACH_THREAD_STATE_FLAVOR i386_THREAD_STATE
+#  else
+#  define GC_MACH_THREAD_STATE_FLAVOR MACHINE_THREAD_STATE
+#  endif
+#endif
+
 /* Delay any interrupts or signals that may abort this thread.  Data   */
 /* structures are in a consistent state outside this pair of calls.*/
 /* ANSI C allows both to be empty (though the standard isn't very  */
Index: libgc/os_dep.c
===
--- libgc/os_dep.c  (revision 66738)
+++ libgc/os_dep.c  (working copy)
@@ -3702,7 +3702,7 @@
 mask,
 GC_ports.exception,
 EXCEPTION_DEFAULT,
-MACHINE_THREAD_STATE
+GC_MACH_THREAD_STATE_FLAVOR
 );
 if(r != KERN_SUCCESS) ABORT(task_set_exception_ports failed);
 
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Call for release notes.

2006-10-11 Thread Allan Hsu

On Oct 11, 2006, at 1:36 PM, Miguel de Icaza wrote:

 Hey,

As usual, we are getting ready for a new Mono release, and I would
 like folks to send me updates on important changes since release  
 1.1.17,
 my current draft is here:

   www.go-mono.com/archive/1.1.18

Did my recent GC patch for OS X make it into 1.1.18? Multithreaded  
applications no longer intermittently segfault under OS X might be a  
nice addition:)

-Allan

--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] patch to fix OS X/Darwin segfaults.

2006-10-05 Thread Allan Hsu
This patch fixes a GC bug in darwin_stop_world.c where memory would  
be freed immediately before it was read. I've also submitted this  
patch to libgc upstream. Could somebody look this over and either  
give me the OK to commit or apply it themselves?


-Allan



mono-darwin-stop-world.patch
Description: Binary data


--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] GC/threading-related mach port leak on OS X

2006-06-13 Thread Allan Hsu
I've been spending some time trying to fix a mach port leak that  
occurs under OS X. The bug (and the progress that I've been making)  
is logged here:


http://bugs.ximian.com/show_bug.cgi?id=78628

I've made a little progress by adding calls to mach_port_deallocate()  
in darwin_stop_world.c and attempting to use the libgc 6.6 release  
instead of the version of libgc that lives in Mono SVN.


I'm now a little stuck because I don't know enough about how the GC  
works to know where to look next. My most recent update in the  
bugzilla entry describes my suspicions on what I think is going on,  
but I don't know where the code in question lives (or how it works).


Can somebody with knowledge of the GC help me out here?

-Allan
--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] GC/threading-related mach port leak on OS X

2006-06-13 Thread Allan Hsu

On Jun 13, 2006, at 5:16 PM, Allan Hsu wrote:

I've been spending some time trying to fix a mach port leak that  
occurs under OS X. The bug (and the progress that I've been making)  
is logged here:


http://bugs.ximian.com/show_bug.cgi?id=78628


I think I've fixed this with a patch against libgc 6.6. I've attached  
the patch to the bug report. libgc 6.6 seems to fix one of the leaks  
I was seeing with libgc in Mono SVN, but it has other port leaks and  
memory leaks that are fixed in my patch.


Now... the question is, how do I get these fixes into Mono?

-Allan

--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Async socket connection problem on FreeBSD

2006-02-12 Thread Allan Hsu
I used to have this same problem on OS X. You may want to try the  
latest release (1.1.13.2) to see if the problem still exists (it has  
at least been fixed on OS X).


-Allan
--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779


On Feb 11, 2006, at 6:40 AM, Alex Chudnovsky wrote:


Hi all,

Apologies for not posting to FreeBSD specific list but my attempts  
to subscribe to it did not seem to have succeeded.


The test case below is for an issue on Mono v1.1.12 running on  
FreeBSD - basically if asyncronous socket connection is being made  
then it never succeeds - it just hangs out there and callback never  
happens. Syncronous version of connect works fine - in the test  
case by default async connection will be attempted but if any  
command line arguements used then syncronous will be done.


Any ideas would be appreciated.

/*  */
using System;
using System.Net;
using System.Net.Sockets;

namespace Majestic12
{
   /// summary
   /// SocketTest: test of socket connection failure on Mono  
running on FreeBSD

   /// /summary
   class SocketTest
   {

   [STAThread]
   static void Main(string[] args)
   {
   bool bUseAsync=true;

   if(args.Length==0)
   Console.WriteLine(No params detected, will use  
ASYNC socket operation, put anything to make it use SYNCronous  
request);

   else
 bUseAsync=false;


   // known high-uptime host: www.bbc.co.uk
   string sIP=212.58.224.125;
   int iPort=80;

   SocketTest oST=new SocketTest();
   oST.Start(sIP,iPort,bUseAsync);
   }

   Socket oConn=null;

   void Start(string sIP,int iPort,bool bUseAsync)
   {
   Console.WriteLine(Trying to connect to {0}:{1} using  
{2} IO,sIP,iPort,bUseAsync ? ASYNCronous : SYNCronous);


   IPEndPoint oEP=new IPEndPoint(IPAddress.Parse(sIP),iPort);

   oConn=new Socket 
(oEP.Address.AddressFamily,SocketType.Stream,ProtocolType.Tcp);


   if(bUseAsync)
   oConn.BeginConnect(oEP,new AsyncCallback 
(EndConnect),this);

   else
   {
   oConn.Connect(oEP);
   Console.WriteLine(SYNC IO successfully worked!);
   }

   Console.WriteLine(Press ENTER to exit - if you used  
ASYNC IO then wait for callback confirmation);

   Console.ReadLine();
   }

   /// summary
   /// This function will be called when using AsyncIO
   /// /summary
   void EndConnect(IAsyncResult oAR)
   {
   Console.WriteLine(ASYNC EndConnect callback received!);

   try
   {
   SocketTest oThis=(SocketTest) oAR.AsyncState;

   oThis.oConn.EndConnect(oAR);
   }
   catch(SocketException oEx)
   {
   Console.WriteLine(SOCKET ERROR: +oEx.ToString());
   }
   catch(Exception oEx)
   {
   Console.WriteLine(GENERAL ERROR: +oEx.ToString());
   }
   }
   }
}



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono_thread_attach/mono_thread_detach not threadsafe?

2006-01-30 Thread Allan Hsu

One thing I forgot to mention:

This code will not print the warning on OS X or segfault on Linux if  
you run it with GC_DONT_GC=1.


-Allan
--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779


On Jan 30, 2006, at 7:44 PM, Allan Hsu wrote:

Lately, we've been seeing a lot of messages in the imeem OS X  
client that look like this:


** (process:23127): WARNING **: _wapi_handle_unref: Attempting to  
unref unused handle 0x7cf


These messages eventually lead to messages of this form:

** (process:23127): WARNING **: _wapi_thread_apc_pending: error  
looking up thread handle 0x2c8


I've tracked down these messages to our use of mono_thread_attach  
and mono_thread_detach; I've isolated the messages down to a small  
bit of C that mimics the Dumbarton NSThread poser (minus the  
Objective-C code):


http://www.blargle.com/~allan/racy.tar.bz2

Is this proper usage of mono_thread_attach/mono_thread_detach? The  
results that follow seem to suggest that I'm either using these two  
functions incorrectly, or these functions are not threadsafe and  
the code in the tarball is exposing some sort of race condition.


The code:

The code simply initializes the JIT and then creates 64 threads  
that call mono_thread_attach, then mono_thread_detach, then joins  
each thread and repeats the process indefinitely.


Under Mono 1.1.13.2 on OS X 10.4.4, the sample code eventually  
generates a lot of _wapi_handle_unref g_log calls that originate  
from CloseHandle called from the finalizer thread:


#0  0x006e8688 in g_log ()
#1  0x00317130 in _wapi_handle_unref (handle=0x2863) at handles.c:827
#2  0x00317c18 in CloseHandle (handle=0x2863) at handles.c:1040
#3  0x00309d74 in  
ves_icall_System_Threading_Thread_Thread_free_internal (this=0x0,  
thread=0x10) at threads.c:555

#4  0x00064c58 in ?? ()
#5  0x00064968 in ?? ()
#6  0x0006458c in ?? ()
#7  0x0022f334 in mono_jit_runtime_invoke (method=0x111c6c0,  
obj=0xe4870, params=0x0, exc=0xf0103c90) at mini.c:9863
#8  0x002e5b9c in mono_runtime_invoke (method=0x0, obj=0x10,  
params=0x382300, exc=0x3822ec) at object.c:1346

#9  0x002b446c in run_finalize (obj=0xe4870, data=0x0) at gc.c:102
#10 0x00343920 in GC_invoke_finalizers ()
#11 0x002b5300 in finalizer_thread (unused=0x0) at gc.c:778
#12 0x003097c8 in start_wrapper (data=0x0) at threads.c:305
#13 0x0032a360 in timed_thread_start_routine (args=0x1120a40) at  
timed-thread.c:134

#14 0x9002b200 in _pthread_body ()

Sometimes (though rarely), this code will cause Mono on OS X to  
segfault. This will happen more often if you increase  
CHUNK_THREADCOUNT to 200 or more.


Under Mono 1.1.13.2  on (32-bit) Linux 2.6.9, the sample code  
almost immediately dies with a segfault; Mono catches the segfault  
roughly half the time. Here is a backtrace:


0x00d1c890 in pthread_kill () from /lib/tls/libpthread.so.0
(gdb) bt
#0  0x00d1c890 in pthread_kill () from /lib/tls/libpthread.so.0
#1  0x0025bb05 in GC_suspend_all () from /usr/lib/libmono.so.0
#2  0x0025bb49 in GC_suspend_all () from /usr/lib/libmono.so.0
#3  0x0025bcf7 in GC_stop_world () from /usr/lib/libmono.so.0
#4  0x0024b731 in GC_stopped_mark () from /usr/lib/libmono.so.0
#5  0x0024b3b4 in GC_try_to_collect_inner () from /usr/lib/ 
libmono.so.0

#6  0x0024c4f3 in GC_collect_or_expand () from /usr/lib/libmono.so.0
#7  0x0024c736 in GC_allocobj () from /usr/lib/libmono.so.0
#8  0x00250ed1 in GC_generic_malloc_inner () from /usr/lib/ 
libmono.so.0

#9  0x00250ff1 in GC_generic_malloc () from /usr/lib/libmono.so.0
#10 0x002512dd in GC_malloc () from /usr/lib/libmono.so.0
#11 0x001da3df in mono_gc_alloc_fixed () from /usr/lib/libmono.so.0
#12 0x001f3385 in mono_thread_get_pending_exception ()
   from /usr/lib/libmono.so.0
#13 0x001f3531 in mono_thread_get_pending_exception ()
   from /usr/lib/libmono.so.0
#14 0x001f0718 in mono_thread_attach () from /usr/lib/libmono.so.0
#15 0x080487f1 in thread_function ()
#16 0x00d19341 in start_thread () from /lib/tls/libpthread.so.0
#17 0x00c846fe in clone () from /lib/tls/libc.so.6

-Allan
--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] _wapi_handle_unref: what does it mean?

2006-01-28 Thread Allan Hsu

Can anybody tell me what the meaning of this console output is?

** (process:12367): WARNING **: _wapi_handle_unref: Attempting to  
unref unused handle 0xcb7
** (process:12367): WARNING **: _wapi_handle_unref: Attempting to  
unref unused handle 0xcb7
** (process:12367): WARNING **: _wapi_handle_unref: Attempting to  
unref unused handle 0xccb


 More specifically, I'd like to know just how bad it is and what  
sorts of things can cause it.


-Allan

--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] embedded runtime questions

2005-09-13 Thread Allan Hsu

On Sep 12, 2005, at 6:08 AM, Paolo Molaro wrote:


Upgrade to 1.1.9, this issue should be fixed (at least as long as you
call mono_thread_attach()).


I've noticed that there is a matching function called  
mono_thread_detach(). Do I need to call this before the thread exits?



junk about using mono_thread_create


It's not fine as the thread stack is not registered with the GC so  
some

objects could be freed under your back. Upgrading to 1.1.9 should not
require this hack.


Good to know. I will stop doing this:)


Some of the complexity is because that function is also very flexible.
We may provide an API like the following:
typedef void* MonoInvokeHandle;

MonoInvokeHandle mono_runtime_prepare_invoke (MonoMethod *method);
MonoObject*  mono_runtime_invoke_handle  (void *obj, void  
**params, MonoObject **exc, MonoInvokeHandle method_handle);


You can easily prototype that, and test to see how much of a  
speedup it is.

My plan is to eventually do it with a different invoke interface,
though, because in my tests the biggest overhead with the current
interface is that we need to allocate an object if the method  
returns a

valuetype: I'd like to fix both performance issues at once.


I'll give this a try. I'll report back here with my findings. Is  
there a timeline for when you want to get this sort of functionality  
into Mono?



Full, non-cached embedded Mono C API lookup/invocation (parent
lookup, etc): ~6 usec
locally saved Mono C API (using the same MonoMethod* over and over):
~2.9 usec
self-written caching, using Judy Arrays: ~3.2 usec

I'm currently using a caching scheme that uses (MonoClass*, method
name, number of arguments) as a key that maps to MonoMethod*



The lookup is going to be your bottleneck with the above interface:
why do you need to poerform it at every call?


This type of method calling is intended for a general use case where  
the convenience of not requiring the caller to keep track of a  
MonoMethod* outweighs the ~10% performance penalty incurred from  
caching/lookup (and 10% is a whole lot better than our previous 100%  
when we weren't caching at all :) It doesn't prevent the caller from  
using the faster form, but that doesn't mean it shouldn't be decently  
fast.


-Allan

--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] embedded runtime questions

2005-09-09 Thread Allan Hsu
Some of us from imeem will be at PDC next week and we'll definitely  
be at the Mono meeting on Tuesday. I'd love to meet some of you guys  
and provide a look at what we're doing with Mono.


Now, on to some questions I have regarding the Mono embedded C API:

1. Under the Mono 1.1.8.1, (most recent release made for OS X), the  
instructions from the Wiki entry  (http://mono-project.com/ 
Embedding_Mono#Threading_issues) to call mono_thread_attach don't  
work in all situations. I get an error telling me to include gc.h  
before pthread.h, which is impossible for me to do in the cases  
where the current thread was not created by my own code.


Instead, I've been using mono_thread_create in an Objective-C  
NSThread poser class. Is it safe to do this? This function is not  
mentioned in the Wiki entry. If so, is there any additional setup/ 
teardown I need to perform? It seems to work, but I'm unsure as to  
whether or not I'm being totally clean about it.


2. Is there a facility to get a MonoMethod* that is more specific  
than mono_class_get_method_from_name? This works fine until you have  
multiple methods with the same name and the same number of arguments.  
I've been able to work around the problems I've had by tweaking my C#  
code (renaming methods, etc), but I could see this being a problem  
for people that are calling into corlib or other C# assemblies that  
are not their own.


3. Is there any way to reduce method invocation overhead past caching  
MonoMethod*s? I notice that mono_jit_runtime_invoke in mini.c emits  
and compiles an invocation wrapper with this function prototype:


MonoObject *(*runtime_invoke) (MonoObject *this, void **params,  
MonoObject **exc, void* compiled_method);


As far as I can tell, every time mono_jit_runtime_invoke is called,  
it has to make sure that the MonoMethod in question is inflated and  
JITed and that it there is also an invocation wrapper emitted and  
JITed before actually calling the runtime_invoke function. I would  
love to be able to cache pointers to both the compiled method as well  
as the invocation wrapper, so that I could do something like this,  
avoiding the lookup overhead in mono_jit_runtime_invoke:


MonoObject *result = someCachedRuntimeInvoke(someObject, monoArgs,  
monoException, someCachedCompiledMethod);


Even better would be if it were possible to JIT the invocation  
wrapper in such a way that saving a pointer to the compiled method  
were not necessary.


Here are some of my informal benchmarking numbers on function calling/ 
message passing/method invocation overhead on a 2Ghz G5 iMac. The  
numbers are average call times for nop methods called several hundred  
thousand times:


Objective-C message passing: ~.055 usec
C# method calls: ~0.04 usec
Full, non-cached embedded Mono C API lookup/invocation (parent  
lookup, etc): ~6 usec
locally saved Mono C API (using the same MonoMethod* over and over):  
~2.9 usec

self-written caching, using Judy Arrays: ~3.2 usec

I'm currently using a caching scheme that uses (MonoClass*, method  
name, number of arguments) as a key that maps to MonoMethod*  
pointers. I'm hoping I can reduce call overhead further by mapping  
the same key straight to function pointers. What do you think? The  
unmanaged thunk proposal in the embedding page sounds interesting,  
but I'd be happy with something more complicated.


See you guys next week at PDC.

-Allan
--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono on OSX 10.4 (Cocoa and Threading)

2005-09-02 Thread Allan Hsu



On Sep 1, 2005, at 10:52 PM, Frank Bergmann wrote:

I must say I had trouble creating a test case (since it is a rather  
big
project). I found out the following though: Under Win32 I used the  
Mutex

class during thread spawning to ensure thread safety. Under OS X these
Mutex's caused a deadlock. Even after creation the first WaitOne()  
would

not return. Writing up a test case did not work out again, as every
simple use of the Mutex class did what it was supposed to do. So  
for now

I removed them and watch my step.



I suspect there may be a bug in the mono Mutex/Monitor implementation  
under OS X. I have experienced similar deadlocks. The toy code I  
posted for bug #75558 sometimes exhibits deadlocking behaviour under  
10.4.2:


http://bugs.ximian.com/show_bug.cgi?id=75558

-Allan

--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-devel-list] plans for a native AES wrapper.

2005-07-21 Thread Allan Hsu

On Jul 21, 2005, at 10:04 AM, Paolo Molaro wrote:


As explained a few days ago in another mail: internal calls have
nothing to do with speed. They only belong inside the general
purpouse mono if they are generally useful or in your specialized
embedding app. So they are not appropriate as a substitute for
pinvoking into an unmanaged lib.

lupus


I think I may have misread the part of the wiki that talks about  
embedding mono (http://mono-project.com/Embedding_Mono):


The Mono runtime provides two mechanisms to expose C code to the CIL  
universe: internal calls and native C code. Internal calls are  
tightly integrated with the runtime, and have the least overhead, as  
they use the same data types that the runtime uses.
The other option is to use the Platform Invoke (P/Invoke) to call C  
code from the CIL universe, using the standard P/Invoke mechanisms.


Does that text actually list *three* options, not two? It also seems  
to suggest that internal calls are faster than *something*.  I had  
previously thought that it was comparing native calls to p/invoke.  
Was I wrong?


We've written both a p/invoke and an internal call Rijndael wrapper.  
There is a negligible performance difference between the two. We may  
still ship the internal call implementation for reasons unrelated to  
performance; it reduces our library dependencies and makes it harder  
to do something malicious via library substitution.


-Allan

--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-devel-list] plans for a native AES wrapper.

2005-07-18 Thread Allan Hsu
After last week's AES benchmarking, we've decided to write a  
managed-native wrapper around the openssl libcrypto library for the  
sake of performance. From my experience with embedded mono, it seems  
straightforward enough to write a RijndaelNative class that contains  
method declarations marked as internal calls that I will register at  
runtime. This will work fine for the cases where I'm embedding mono  
inside a native application, but I don't know enough about mono to  
know if I can use this same strategy for situations in which I'm not  
embedding mono. Is it possible to register internal calls at runtime  
when running mono like a normal, sane person? Will this be any faster  
than using p/Invoke?


-Allan
--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-devel-list] mono AES performance woes (was: poor PPC JIT output)

2005-07-18 Thread Allan Hsu

On Jul 18, 2005, at 2:59 AM, Paolo Molaro wrote:


On 07/15/05 Allan Hsu wrote:


Is there any reference on what sorts of things you can change using
mono_set_defaults? Following the mono source for references to that
function wasn't particularly enlightening. It would be useful if the



grep mono_set_defaults *.c
mini.c:mono_set_defaults (int verbose_level, guint32 opts)
Should be pretty evident. Just always use the result of
mono_parse_default_optimizations (NULL) as the opts value.


I understood the verbose_level parameters, but the opts parameter was  
what mystified me. I should have been more specific about what I was  
looking for. At the time, I didn't understand the value that  
mono_parse_default_optimizations() returns or what values you can  
pass in to affect it. I've since traced it back to the relevant code  
in driver.c and the mini-X.c platform code now and see how it works.  
Is it safe to mess with those parameters, or will it cause undefined  
results?


To be fair, the native implementation is able to take advantage of  
64-

bit processors when available, while all mono builds in the above
benchmarks are 32-bit. The Windows XP machine is the standard 32-bit
install, even though the processor is 64-bit. This is a pretty
informal benchmark, but all I'm interested in showing here is how bad
the AES performance under mono is.



The current implementation causes lots of spilling and other
unnecessary work which the jit doesn't remove (the work massi is
doing should improve this). Some parts of it can be easily changed
to use unsafe code and that should improve performance a lot: I'll  
leave

that to Sebastien:-)


This is good to hear. I hope the benchmarking I did will provide some  
information that somebody will find useful.


For my specific application, there is no such thing as enough  
performance:) I plan on writing a managed wrapper around libcrypto  
for this reason. This will be the subject of another email.



Some of the data looks definitely bogus: it reports a stall even on
the addi, here:

   0x2e143c8 lwz  r4,32(r1)3:1 Stall=2
   0x2e143cc lwz  r5,12(r4)3:1 Stall=2
   0x2e143d0 cmplwi   r5,0x 3:1 Stall=2
   0x2e143d4 blel $+696 0x2e1468c [8B]2:1
0.4%0x2e143d8 addi r4,r4,16 2:1 Stall=1


[...]


As for the stall statistics, you have misread them. Each line that
says Stall=N is saying that the instruction latency of the marked
instruction will cause a subsequent dependent instruction to stall,
not that the marked instruction itself will stall. N is the maximum
number of stall cycles for the nearest dependent instruction. The



Since the tool reports that the addi stalls only sometimes (check the
similar code sequences where no stall is reported), my take
is that your interpretation or the data reported is not correct.


I'm not sure if my meaning came across. The line next to the addi  
instruction that says Stall=1 means that a dependent instruction  
*following* the addi looks like it will stall while waiting for the  
results from addi, not that the addi instruction itself will stall.  
The code that follows that specific instruction looks like this:


0.4%0x2e143d8 addi r4,r4,162:1Stall=1
0x2e143dc lbz  r4,0(r4)3:1Stall=2
0x2e143e0 add  r3,r3,r42:1Stall=1
0x2e143e4 stw  r3,44(r1)3:1

The instruction latency of the addi instruction is 2 cycles; the lbz  
that immediately follows the addi is dependent on the addi. The lbz  
will stall for 1 cycle. That is what the Shark output is trying to say.


-Allan

--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-devel-list] mono AES performance woes (was: poor PPC JIT output)

2005-07-15 Thread Allan Hsu

On Jul 15, 2005, at 3:39 AM, Paolo Molaro wrote:


On 07/14/05 Allan Hsu wrote:


Code generated by the PPC code emitter performs very poorly in
comparison to the same code emitted for other platforms (most
notably, x86). I had a brief conversation about this with Miguel in
#mono today and he suggested that I post some examples.



I'm sure he meant an actual test case, which you didn't provide.


I apologize for that. I was sharing the information I had already  
gathered as part of an investigation into the poor performance of the  
OS X port of our product. I was not sure if this sort of data was  
useful or if, as seems the case, I was doing something wrong. It  
looks like the performance problems I was running into are not  
specific to PPC, but the lack of JIT optimization (which I've  
remedied) made them *very* apparent.



Preliminary profiling with Shark (a profiling tool that is part of
the Apple CHUD tools) shows some heinously inefficient JIT output on
both G4 and G5 machines. Here's some sample Shark analysis on the
code emitted by mono 1.1.8.1 from
System.Security.Cryptography.RijndaelTransform.ECB(byte[], byte[])
and System.Security.Cryptography.RijndaelTransform.ShiftRow(bool):

http://strangecargo.org/~allan/mono/



It looks like optimizations are not enabled: are you embedding mono
in your app?
You should try adding:
mono_set_defaults (0, mono_parse_default_optimizations (NULL));
before the call to mono_jit_init ().


I am indeed using embedded mono, and I was not at all aware that  
optimizations were disabled by default. This does not occur in any of  
the sample code that I've seen and this is the first I've heard of it.


Is there any reference on what sorts of things you can change using  
mono_set_defaults? Following the mono source for references to that  
function wasn't particularly enlightening. It would be useful if the  
Wiki page on embedding mono mentioned JIT optimization.


I have done some more isolated testing of AES performance after  
turning on optimization and it seems that the JIT-emitted PPC code is  
roughly on par with x86 mono performance. Here is the code I used for  
some simple benchmarking:


http://strangecargo.org/~allan/mono/aes.tar.bz2

Here's some times for 1000 encrypts/decrypts of 32768 byte chunks  
from some machines we have here in the office, ordered by speed:

57.7 seconds under mono 1.1.8.1, OS X 10.4.2 (1.67 Ghz G4 1.2)
55.0 seconds under mono 1.1.8.1, Linux 2.6.9 (1.8 Ghz Athlon XP 2500+)
45.8 seconds under mono 1.1.8.1, Linux 2.6.9 (2.2 Ghz Athlon 64 3200+)
42.4 seconds under mono 1.1.8.1, OS X 10.4.2 (2.0 Ghz G5 3.0)
9.01 seconds under Microsoft .NET 1.1.4322, Windows XP Pro SP2 (2.0  
Ghz Athlon 64 3200+)


If you look at the benchmark code, it uses RijndaelManaged to do  
encrypt/decrypt. This class is supposedly 100% managed code in the  
Microsoft implementation.


Included in the tarball is some native code that links against  
OpenSSL to do the same thing. This is what native performance for the  
same sized chunks looks like:


1.67 seconds under OpenSSL 0.9.7a, Linux 2.6.9 (1.8 Ghz Athlon XP 2500+)
1.44 seconds under OpenSSL 0.9.7, OS X 10.4.2 (1.67 Ghz G4 1.2)
1.05 seconds under OpenSSL 0.9.7, OS X 10.4.2 (2.0 Ghz G5 3.0)
.67 seconds under OpenSSL 0.9.7a, Linux 2.6.9 (2.2 Ghz Athlon 64 3200+)

To be fair, the native implementation is able to take advantage of 64- 
bit processors when available, while all mono builds in the above  
benchmarks are 32-bit. The Windows XP machine is the standard 32-bit  
install, even though the processor is 64-bit. This is a pretty  
informal benchmark, but all I'm interested in showing here is how bad  
the AES performance under mono is.


It was suggested in #mono that I try compiling the mono AES  
implementation under VS.NET and run it under the Microsoft VM to  
compare performance..

The resulting project is available here:
http://strangecargo.org/~allan/mono/AESSpeedTest.zip

The same operation benchmarks thusly:
22.76 seconds under Microsoft .NET 1.1.4322, Windows XP Pro SP2 (2.0  
Ghz Athlon 64 3200+)


The AES code is taken from mono svn, so it may be different from the  
code used in the mono 1.1.8.1 benchmarks above.


While switching to the Microsoft VM boosts speed significantly, it  
looks like significant gains could be made by optimizing the mono  
RijndaelManaged code.


(some insightful comment would go here if I weren't so tired of  
writing this email).


-Allan

everything below doesn't matter so much, since it was based on  
information gathered from unoptimized JIT output

Information on how to read Shark analysis comes with Shark (available
for free from the Apple Developer Connection website).



A direct pointer to the doc would be useful.


Unfortunately, I can't find a copy of the documentation that's  
available online (otherwise, I would have linked it). The closest  
thing I can find to online documentation is this document: http:// 
developer.apple.com/tools

[Mono-devel-list] poor PPC JIT output

2005-07-14 Thread Allan Hsu
Code generated by the PPC code emitter performs very poorly in  
comparison to the same code emitted for other platforms (most  
notably, x86). I had a brief conversation about this with Miguel in  
#mono today and he suggested that I post some examples.


Preliminary profiling with Shark (a profiling tool that is part of  
the Apple CHUD tools) shows some heinously inefficient JIT output on  
both G4 and G5 machines. Here's some sample Shark analysis on the  
code emitted by mono 1.1.8.1 from  
System.Security.Cryptography.RijndaelTransform.ECB(byte[], byte[])  
and System.Security.Cryptography.RijndaelTransform.ShiftRow(bool):


http://strangecargo.org/~allan/mono/

Information on how to read Shark analysis comes with Shark (available  
for free from the Apple Developer Connection website). (A summary:  
numerous and frequent pipeline stalls, unoptimized loops).


Is there any active effort to optimize the PPC code emitter? The  
above two methods account for the majority of CPU time on a pegged  
2Ghz G5 while decrypting AES blocks coming off the wire. The x86  
machine encrypting the data (also running mono) doesn't even break a  
sweat.


-Allan
--
Allan Hsu allan at counterpop dot net
1E64 E20F 34D9 CBA7 1300  1457 AC37 CBBB 0E92 C779


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list