Re: [Mono-dev] OS X builds and -DUSE-MUNMAP
On Dec 1, 2006, at 2:55 PM, Miguel de Icaza wrote: Hello, Nobody has objected to turning on -DUSE_MUNMAP under OS X since I posted this email... Can somebody give me the go-ahead to commit the configure.in change needed to switch over? Go ahead. Committed. -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] OS X builds and -DUSE-MUNMAP
Does anybody have a problem with using -DUSE_MUNMAP in the CPPFLAGS setting in configure.in under OS X? I talked to Paolo about it at the developer's meeting last month and he had said that the only reason it wasn't turned on was because nobody had verified that it worked. We (imeem) have been shipping builds of Mono with this turned on for a few months now, so as far as I know, it does work. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] [PATCH] OS X MACHINE_THREAD_STATE patch for newer 10.4u SDK
It should be backwards compatible with older versions of the SDK. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 On Nov 12, 2006, at 4:22 AM, Miguel de Icaza wrote: Will this patch still allow people with older versions of gcc/Xcode to work? Or do we need to do some auto-detection? ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] [PATCH] OS X MACHINE_THREAD_STATE patch for newer 10.4u SDK
This patch fixes problems with building mono for i386 using the new 10.4u SDK that ships with Xcode 2.4. Current SVN builds for i386 under the new SDK, but does not run properly due to changes in MACHINE_THREAD_STATE. It looks like the breakage in source compatibility is intentional on Apple's part and future headers will be similarly broken. I will be submitting the same patch to libgc upstream. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 Index: libgc/darwin_stop_world.c === --- libgc/darwin_stop_world.c (revision 66738) +++ libgc/darwin_stop_world.c (working copy) @@ -75,12 +75,14 @@ ptr_t lo, hi; #if defined(POWERPC) ppc_thread_state_t state; + mach_msg_type_number_t thread_state_count = PPC_THREAD_STATE_COUNT; #elif defined(I386) i386_thread_state_t state; + mach_msg_type_number_t thread_state_count = i386_THREAD_STATE_COUNT; #else # error FIXME for non-x86 || ppc architectures + mach_msg_type_number_t thread_state_count = MACHINE_THREAD_STATE_COUNT; #endif - mach_msg_type_number_t thread_state_count = MACHINE_THREAD_STATE_COUNT; me = pthread_self(); if (!GC_thr_initialized) GC_thr_init(); @@ -94,7 +96,7 @@ /* Get the thread state (registers, etc) */ r = thread_get_state( p-stop_info.mach_thread, -MACHINE_THREAD_STATE, +GC_MACH_THREAD_STATE_FLAVOR, (natural_t*)state, thread_state_count); if(r != KERN_SUCCESS) ABORT(thread_get_state failed); @@ -193,7 +195,7 @@ ppc_thread_state64_t info; # endif mach_msg_type_number_t outCount = THREAD_STATE_MAX; - r = thread_get_state(thread, MACHINE_THREAD_STATE, + r = thread_get_state(thread, GC_MACH_THREAD_STATE_FLAVOR, (natural_t *)info, outCount); if(r != KERN_SUCCESS) continue; @@ -236,7 +238,7 @@ WARN(This is completely untested and likely will not work\n, 0); i386_thread_state_t info; mach_msg_type_number_t outCount = THREAD_STATE_MAX; - r = thread_get_state(thread, MACHINE_THREAD_STATE, + r = thread_get_state(thread, GC_MACH_THREAD_STATE_FLAVOR, (natural_t *)info, outCount); if(r != KERN_SUCCESS) continue; Index: libgc/include/private/gc_priv.h === --- libgc/include/private/gc_priv.h (revision 66738) +++ libgc/include/private/gc_priv.h (working copy) @@ -366,6 +366,16 @@ # define BZERO(x,n) bzero((char *)(x),(int)(n)) # endif +#if defined(DARWIN) +# if defined(POWERPC) +# define GC_MACH_THREAD_STATE_FLAVOR PPC_THREAD_STATE +# elif defined(I386) +# define GC_MACH_THREAD_STATE_FLAVOR i386_THREAD_STATE +# else +# define GC_MACH_THREAD_STATE_FLAVOR MACHINE_THREAD_STATE +# endif +#endif + /* Delay any interrupts or signals that may abort this thread. Data */ /* structures are in a consistent state outside this pair of calls.*/ /* ANSI C allows both to be empty (though the standard isn't very */ Index: libgc/os_dep.c === --- libgc/os_dep.c (revision 66738) +++ libgc/os_dep.c (working copy) @@ -3702,7 +3702,7 @@ mask, GC_ports.exception, EXCEPTION_DEFAULT, -MACHINE_THREAD_STATE +GC_MACH_THREAD_STATE_FLAVOR ); if(r != KERN_SUCCESS) ABORT(task_set_exception_ports failed); ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Call for release notes.
On Oct 11, 2006, at 1:36 PM, Miguel de Icaza wrote: Hey, As usual, we are getting ready for a new Mono release, and I would like folks to send me updates on important changes since release 1.1.17, my current draft is here: www.go-mono.com/archive/1.1.18 Did my recent GC patch for OS X make it into 1.1.18? Multithreaded applications no longer intermittently segfault under OS X might be a nice addition:) -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] patch to fix OS X/Darwin segfaults.
This patch fixes a GC bug in darwin_stop_world.c where memory would be freed immediately before it was read. I've also submitted this patch to libgc upstream. Could somebody look this over and either give me the OK to commit or apply it themselves? -Allan mono-darwin-stop-world.patch Description: Binary data -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] GC/threading-related mach port leak on OS X
I've been spending some time trying to fix a mach port leak that occurs under OS X. The bug (and the progress that I've been making) is logged here: http://bugs.ximian.com/show_bug.cgi?id=78628 I've made a little progress by adding calls to mach_port_deallocate() in darwin_stop_world.c and attempting to use the libgc 6.6 release instead of the version of libgc that lives in Mono SVN. I'm now a little stuck because I don't know enough about how the GC works to know where to look next. My most recent update in the bugzilla entry describes my suspicions on what I think is going on, but I don't know where the code in question lives (or how it works). Can somebody with knowledge of the GC help me out here? -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] GC/threading-related mach port leak on OS X
On Jun 13, 2006, at 5:16 PM, Allan Hsu wrote: I've been spending some time trying to fix a mach port leak that occurs under OS X. The bug (and the progress that I've been making) is logged here: http://bugs.ximian.com/show_bug.cgi?id=78628 I think I've fixed this with a patch against libgc 6.6. I've attached the patch to the bug report. libgc 6.6 seems to fix one of the leaks I was seeing with libgc in Mono SVN, but it has other port leaks and memory leaks that are fixed in my patch. Now... the question is, how do I get these fixes into Mono? -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Async socket connection problem on FreeBSD
I used to have this same problem on OS X. You may want to try the latest release (1.1.13.2) to see if the problem still exists (it has at least been fixed on OS X). -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 On Feb 11, 2006, at 6:40 AM, Alex Chudnovsky wrote: Hi all, Apologies for not posting to FreeBSD specific list but my attempts to subscribe to it did not seem to have succeeded. The test case below is for an issue on Mono v1.1.12 running on FreeBSD - basically if asyncronous socket connection is being made then it never succeeds - it just hangs out there and callback never happens. Syncronous version of connect works fine - in the test case by default async connection will be attempted but if any command line arguements used then syncronous will be done. Any ideas would be appreciated. /* */ using System; using System.Net; using System.Net.Sockets; namespace Majestic12 { /// summary /// SocketTest: test of socket connection failure on Mono running on FreeBSD /// /summary class SocketTest { [STAThread] static void Main(string[] args) { bool bUseAsync=true; if(args.Length==0) Console.WriteLine(No params detected, will use ASYNC socket operation, put anything to make it use SYNCronous request); else bUseAsync=false; // known high-uptime host: www.bbc.co.uk string sIP=212.58.224.125; int iPort=80; SocketTest oST=new SocketTest(); oST.Start(sIP,iPort,bUseAsync); } Socket oConn=null; void Start(string sIP,int iPort,bool bUseAsync) { Console.WriteLine(Trying to connect to {0}:{1} using {2} IO,sIP,iPort,bUseAsync ? ASYNCronous : SYNCronous); IPEndPoint oEP=new IPEndPoint(IPAddress.Parse(sIP),iPort); oConn=new Socket (oEP.Address.AddressFamily,SocketType.Stream,ProtocolType.Tcp); if(bUseAsync) oConn.BeginConnect(oEP,new AsyncCallback (EndConnect),this); else { oConn.Connect(oEP); Console.WriteLine(SYNC IO successfully worked!); } Console.WriteLine(Press ENTER to exit - if you used ASYNC IO then wait for callback confirmation); Console.ReadLine(); } /// summary /// This function will be called when using AsyncIO /// /summary void EndConnect(IAsyncResult oAR) { Console.WriteLine(ASYNC EndConnect callback received!); try { SocketTest oThis=(SocketTest) oAR.AsyncState; oThis.oConn.EndConnect(oAR); } catch(SocketException oEx) { Console.WriteLine(SOCKET ERROR: +oEx.ToString()); } catch(Exception oEx) { Console.WriteLine(GENERAL ERROR: +oEx.ToString()); } } } } ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] mono_thread_attach/mono_thread_detach not threadsafe?
One thing I forgot to mention: This code will not print the warning on OS X or segfault on Linux if you run it with GC_DONT_GC=1. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 On Jan 30, 2006, at 7:44 PM, Allan Hsu wrote: Lately, we've been seeing a lot of messages in the imeem OS X client that look like this: ** (process:23127): WARNING **: _wapi_handle_unref: Attempting to unref unused handle 0x7cf These messages eventually lead to messages of this form: ** (process:23127): WARNING **: _wapi_thread_apc_pending: error looking up thread handle 0x2c8 I've tracked down these messages to our use of mono_thread_attach and mono_thread_detach; I've isolated the messages down to a small bit of C that mimics the Dumbarton NSThread poser (minus the Objective-C code): http://www.blargle.com/~allan/racy.tar.bz2 Is this proper usage of mono_thread_attach/mono_thread_detach? The results that follow seem to suggest that I'm either using these two functions incorrectly, or these functions are not threadsafe and the code in the tarball is exposing some sort of race condition. The code: The code simply initializes the JIT and then creates 64 threads that call mono_thread_attach, then mono_thread_detach, then joins each thread and repeats the process indefinitely. Under Mono 1.1.13.2 on OS X 10.4.4, the sample code eventually generates a lot of _wapi_handle_unref g_log calls that originate from CloseHandle called from the finalizer thread: #0 0x006e8688 in g_log () #1 0x00317130 in _wapi_handle_unref (handle=0x2863) at handles.c:827 #2 0x00317c18 in CloseHandle (handle=0x2863) at handles.c:1040 #3 0x00309d74 in ves_icall_System_Threading_Thread_Thread_free_internal (this=0x0, thread=0x10) at threads.c:555 #4 0x00064c58 in ?? () #5 0x00064968 in ?? () #6 0x0006458c in ?? () #7 0x0022f334 in mono_jit_runtime_invoke (method=0x111c6c0, obj=0xe4870, params=0x0, exc=0xf0103c90) at mini.c:9863 #8 0x002e5b9c in mono_runtime_invoke (method=0x0, obj=0x10, params=0x382300, exc=0x3822ec) at object.c:1346 #9 0x002b446c in run_finalize (obj=0xe4870, data=0x0) at gc.c:102 #10 0x00343920 in GC_invoke_finalizers () #11 0x002b5300 in finalizer_thread (unused=0x0) at gc.c:778 #12 0x003097c8 in start_wrapper (data=0x0) at threads.c:305 #13 0x0032a360 in timed_thread_start_routine (args=0x1120a40) at timed-thread.c:134 #14 0x9002b200 in _pthread_body () Sometimes (though rarely), this code will cause Mono on OS X to segfault. This will happen more often if you increase CHUNK_THREADCOUNT to 200 or more. Under Mono 1.1.13.2 on (32-bit) Linux 2.6.9, the sample code almost immediately dies with a segfault; Mono catches the segfault roughly half the time. Here is a backtrace: 0x00d1c890 in pthread_kill () from /lib/tls/libpthread.so.0 (gdb) bt #0 0x00d1c890 in pthread_kill () from /lib/tls/libpthread.so.0 #1 0x0025bb05 in GC_suspend_all () from /usr/lib/libmono.so.0 #2 0x0025bb49 in GC_suspend_all () from /usr/lib/libmono.so.0 #3 0x0025bcf7 in GC_stop_world () from /usr/lib/libmono.so.0 #4 0x0024b731 in GC_stopped_mark () from /usr/lib/libmono.so.0 #5 0x0024b3b4 in GC_try_to_collect_inner () from /usr/lib/ libmono.so.0 #6 0x0024c4f3 in GC_collect_or_expand () from /usr/lib/libmono.so.0 #7 0x0024c736 in GC_allocobj () from /usr/lib/libmono.so.0 #8 0x00250ed1 in GC_generic_malloc_inner () from /usr/lib/ libmono.so.0 #9 0x00250ff1 in GC_generic_malloc () from /usr/lib/libmono.so.0 #10 0x002512dd in GC_malloc () from /usr/lib/libmono.so.0 #11 0x001da3df in mono_gc_alloc_fixed () from /usr/lib/libmono.so.0 #12 0x001f3385 in mono_thread_get_pending_exception () from /usr/lib/libmono.so.0 #13 0x001f3531 in mono_thread_get_pending_exception () from /usr/lib/libmono.so.0 #14 0x001f0718 in mono_thread_attach () from /usr/lib/libmono.so.0 #15 0x080487f1 in thread_function () #16 0x00d19341 in start_thread () from /lib/tls/libpthread.so.0 #17 0x00c846fe in clone () from /lib/tls/libc.so.6 -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] _wapi_handle_unref: what does it mean?
Can anybody tell me what the meaning of this console output is? ** (process:12367): WARNING **: _wapi_handle_unref: Attempting to unref unused handle 0xcb7 ** (process:12367): WARNING **: _wapi_handle_unref: Attempting to unref unused handle 0xcb7 ** (process:12367): WARNING **: _wapi_handle_unref: Attempting to unref unused handle 0xccb More specifically, I'd like to know just how bad it is and what sorts of things can cause it. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] embedded runtime questions
On Sep 12, 2005, at 6:08 AM, Paolo Molaro wrote: Upgrade to 1.1.9, this issue should be fixed (at least as long as you call mono_thread_attach()). I've noticed that there is a matching function called mono_thread_detach(). Do I need to call this before the thread exits? junk about using mono_thread_create It's not fine as the thread stack is not registered with the GC so some objects could be freed under your back. Upgrading to 1.1.9 should not require this hack. Good to know. I will stop doing this:) Some of the complexity is because that function is also very flexible. We may provide an API like the following: typedef void* MonoInvokeHandle; MonoInvokeHandle mono_runtime_prepare_invoke (MonoMethod *method); MonoObject* mono_runtime_invoke_handle (void *obj, void **params, MonoObject **exc, MonoInvokeHandle method_handle); You can easily prototype that, and test to see how much of a speedup it is. My plan is to eventually do it with a different invoke interface, though, because in my tests the biggest overhead with the current interface is that we need to allocate an object if the method returns a valuetype: I'd like to fix both performance issues at once. I'll give this a try. I'll report back here with my findings. Is there a timeline for when you want to get this sort of functionality into Mono? Full, non-cached embedded Mono C API lookup/invocation (parent lookup, etc): ~6 usec locally saved Mono C API (using the same MonoMethod* over and over): ~2.9 usec self-written caching, using Judy Arrays: ~3.2 usec I'm currently using a caching scheme that uses (MonoClass*, method name, number of arguments) as a key that maps to MonoMethod* The lookup is going to be your bottleneck with the above interface: why do you need to poerform it at every call? This type of method calling is intended for a general use case where the convenience of not requiring the caller to keep track of a MonoMethod* outweighs the ~10% performance penalty incurred from caching/lookup (and 10% is a whole lot better than our previous 100% when we weren't caching at all :) It doesn't prevent the caller from using the faster form, but that doesn't mean it shouldn't be decently fast. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] embedded runtime questions
Some of us from imeem will be at PDC next week and we'll definitely be at the Mono meeting on Tuesday. I'd love to meet some of you guys and provide a look at what we're doing with Mono. Now, on to some questions I have regarding the Mono embedded C API: 1. Under the Mono 1.1.8.1, (most recent release made for OS X), the instructions from the Wiki entry (http://mono-project.com/ Embedding_Mono#Threading_issues) to call mono_thread_attach don't work in all situations. I get an error telling me to include gc.h before pthread.h, which is impossible for me to do in the cases where the current thread was not created by my own code. Instead, I've been using mono_thread_create in an Objective-C NSThread poser class. Is it safe to do this? This function is not mentioned in the Wiki entry. If so, is there any additional setup/ teardown I need to perform? It seems to work, but I'm unsure as to whether or not I'm being totally clean about it. 2. Is there a facility to get a MonoMethod* that is more specific than mono_class_get_method_from_name? This works fine until you have multiple methods with the same name and the same number of arguments. I've been able to work around the problems I've had by tweaking my C# code (renaming methods, etc), but I could see this being a problem for people that are calling into corlib or other C# assemblies that are not their own. 3. Is there any way to reduce method invocation overhead past caching MonoMethod*s? I notice that mono_jit_runtime_invoke in mini.c emits and compiles an invocation wrapper with this function prototype: MonoObject *(*runtime_invoke) (MonoObject *this, void **params, MonoObject **exc, void* compiled_method); As far as I can tell, every time mono_jit_runtime_invoke is called, it has to make sure that the MonoMethod in question is inflated and JITed and that it there is also an invocation wrapper emitted and JITed before actually calling the runtime_invoke function. I would love to be able to cache pointers to both the compiled method as well as the invocation wrapper, so that I could do something like this, avoiding the lookup overhead in mono_jit_runtime_invoke: MonoObject *result = someCachedRuntimeInvoke(someObject, monoArgs, monoException, someCachedCompiledMethod); Even better would be if it were possible to JIT the invocation wrapper in such a way that saving a pointer to the compiled method were not necessary. Here are some of my informal benchmarking numbers on function calling/ message passing/method invocation overhead on a 2Ghz G5 iMac. The numbers are average call times for nop methods called several hundred thousand times: Objective-C message passing: ~.055 usec C# method calls: ~0.04 usec Full, non-cached embedded Mono C API lookup/invocation (parent lookup, etc): ~6 usec locally saved Mono C API (using the same MonoMethod* over and over): ~2.9 usec self-written caching, using Judy Arrays: ~3.2 usec I'm currently using a caching scheme that uses (MonoClass*, method name, number of arguments) as a key that maps to MonoMethod* pointers. I'm hoping I can reduce call overhead further by mapping the same key straight to function pointers. What do you think? The unmanaged thunk proposal in the embedding page sounds interesting, but I'd be happy with something more complicated. See you guys next week at PDC. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Mono on OSX 10.4 (Cocoa and Threading)
On Sep 1, 2005, at 10:52 PM, Frank Bergmann wrote: I must say I had trouble creating a test case (since it is a rather big project). I found out the following though: Under Win32 I used the Mutex class during thread spawning to ensure thread safety. Under OS X these Mutex's caused a deadlock. Even after creation the first WaitOne() would not return. Writing up a test case did not work out again, as every simple use of the Mutex class did what it was supposed to do. So for now I removed them and watch my step. I suspect there may be a bug in the mono Mutex/Monitor implementation under OS X. I have experienced similar deadlocks. The toy code I posted for bug #75558 sometimes exhibits deadlocking behaviour under 10.4.2: http://bugs.ximian.com/show_bug.cgi?id=75558 -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-devel-list] plans for a native AES wrapper.
On Jul 21, 2005, at 10:04 AM, Paolo Molaro wrote: As explained a few days ago in another mail: internal calls have nothing to do with speed. They only belong inside the general purpouse mono if they are generally useful or in your specialized embedding app. So they are not appropriate as a substitute for pinvoking into an unmanaged lib. lupus I think I may have misread the part of the wiki that talks about embedding mono (http://mono-project.com/Embedding_Mono): The Mono runtime provides two mechanisms to expose C code to the CIL universe: internal calls and native C code. Internal calls are tightly integrated with the runtime, and have the least overhead, as they use the same data types that the runtime uses. The other option is to use the Platform Invoke (P/Invoke) to call C code from the CIL universe, using the standard P/Invoke mechanisms. Does that text actually list *three* options, not two? It also seems to suggest that internal calls are faster than *something*. I had previously thought that it was comparing native calls to p/invoke. Was I wrong? We've written both a p/invoke and an internal call Rijndael wrapper. There is a negligible performance difference between the two. We may still ship the internal call implementation for reasons unrelated to performance; it reduces our library dependencies and makes it harder to do something malicious via library substitution. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-devel-list] plans for a native AES wrapper.
After last week's AES benchmarking, we've decided to write a managed-native wrapper around the openssl libcrypto library for the sake of performance. From my experience with embedded mono, it seems straightforward enough to write a RijndaelNative class that contains method declarations marked as internal calls that I will register at runtime. This will work fine for the cases where I'm embedding mono inside a native application, but I don't know enough about mono to know if I can use this same strategy for situations in which I'm not embedding mono. Is it possible to register internal calls at runtime when running mono like a normal, sane person? Will this be any faster than using p/Invoke? -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-devel-list] mono AES performance woes (was: poor PPC JIT output)
On Jul 18, 2005, at 2:59 AM, Paolo Molaro wrote: On 07/15/05 Allan Hsu wrote: Is there any reference on what sorts of things you can change using mono_set_defaults? Following the mono source for references to that function wasn't particularly enlightening. It would be useful if the grep mono_set_defaults *.c mini.c:mono_set_defaults (int verbose_level, guint32 opts) Should be pretty evident. Just always use the result of mono_parse_default_optimizations (NULL) as the opts value. I understood the verbose_level parameters, but the opts parameter was what mystified me. I should have been more specific about what I was looking for. At the time, I didn't understand the value that mono_parse_default_optimizations() returns or what values you can pass in to affect it. I've since traced it back to the relevant code in driver.c and the mini-X.c platform code now and see how it works. Is it safe to mess with those parameters, or will it cause undefined results? To be fair, the native implementation is able to take advantage of 64- bit processors when available, while all mono builds in the above benchmarks are 32-bit. The Windows XP machine is the standard 32-bit install, even though the processor is 64-bit. This is a pretty informal benchmark, but all I'm interested in showing here is how bad the AES performance under mono is. The current implementation causes lots of spilling and other unnecessary work which the jit doesn't remove (the work massi is doing should improve this). Some parts of it can be easily changed to use unsafe code and that should improve performance a lot: I'll leave that to Sebastien:-) This is good to hear. I hope the benchmarking I did will provide some information that somebody will find useful. For my specific application, there is no such thing as enough performance:) I plan on writing a managed wrapper around libcrypto for this reason. This will be the subject of another email. Some of the data looks definitely bogus: it reports a stall even on the addi, here: 0x2e143c8 lwz r4,32(r1)3:1 Stall=2 0x2e143cc lwz r5,12(r4)3:1 Stall=2 0x2e143d0 cmplwi r5,0x 3:1 Stall=2 0x2e143d4 blel $+696 0x2e1468c [8B]2:1 0.4%0x2e143d8 addi r4,r4,16 2:1 Stall=1 [...] As for the stall statistics, you have misread them. Each line that says Stall=N is saying that the instruction latency of the marked instruction will cause a subsequent dependent instruction to stall, not that the marked instruction itself will stall. N is the maximum number of stall cycles for the nearest dependent instruction. The Since the tool reports that the addi stalls only sometimes (check the similar code sequences where no stall is reported), my take is that your interpretation or the data reported is not correct. I'm not sure if my meaning came across. The line next to the addi instruction that says Stall=1 means that a dependent instruction *following* the addi looks like it will stall while waiting for the results from addi, not that the addi instruction itself will stall. The code that follows that specific instruction looks like this: 0.4%0x2e143d8 addi r4,r4,162:1Stall=1 0x2e143dc lbz r4,0(r4)3:1Stall=2 0x2e143e0 add r3,r3,r42:1Stall=1 0x2e143e4 stw r3,44(r1)3:1 The instruction latency of the addi instruction is 2 cycles; the lbz that immediately follows the addi is dependent on the addi. The lbz will stall for 1 cycle. That is what the Shark output is trying to say. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-devel-list] mono AES performance woes (was: poor PPC JIT output)
On Jul 15, 2005, at 3:39 AM, Paolo Molaro wrote: On 07/14/05 Allan Hsu wrote: Code generated by the PPC code emitter performs very poorly in comparison to the same code emitted for other platforms (most notably, x86). I had a brief conversation about this with Miguel in #mono today and he suggested that I post some examples. I'm sure he meant an actual test case, which you didn't provide. I apologize for that. I was sharing the information I had already gathered as part of an investigation into the poor performance of the OS X port of our product. I was not sure if this sort of data was useful or if, as seems the case, I was doing something wrong. It looks like the performance problems I was running into are not specific to PPC, but the lack of JIT optimization (which I've remedied) made them *very* apparent. Preliminary profiling with Shark (a profiling tool that is part of the Apple CHUD tools) shows some heinously inefficient JIT output on both G4 and G5 machines. Here's some sample Shark analysis on the code emitted by mono 1.1.8.1 from System.Security.Cryptography.RijndaelTransform.ECB(byte[], byte[]) and System.Security.Cryptography.RijndaelTransform.ShiftRow(bool): http://strangecargo.org/~allan/mono/ It looks like optimizations are not enabled: are you embedding mono in your app? You should try adding: mono_set_defaults (0, mono_parse_default_optimizations (NULL)); before the call to mono_jit_init (). I am indeed using embedded mono, and I was not at all aware that optimizations were disabled by default. This does not occur in any of the sample code that I've seen and this is the first I've heard of it. Is there any reference on what sorts of things you can change using mono_set_defaults? Following the mono source for references to that function wasn't particularly enlightening. It would be useful if the Wiki page on embedding mono mentioned JIT optimization. I have done some more isolated testing of AES performance after turning on optimization and it seems that the JIT-emitted PPC code is roughly on par with x86 mono performance. Here is the code I used for some simple benchmarking: http://strangecargo.org/~allan/mono/aes.tar.bz2 Here's some times for 1000 encrypts/decrypts of 32768 byte chunks from some machines we have here in the office, ordered by speed: 57.7 seconds under mono 1.1.8.1, OS X 10.4.2 (1.67 Ghz G4 1.2) 55.0 seconds under mono 1.1.8.1, Linux 2.6.9 (1.8 Ghz Athlon XP 2500+) 45.8 seconds under mono 1.1.8.1, Linux 2.6.9 (2.2 Ghz Athlon 64 3200+) 42.4 seconds under mono 1.1.8.1, OS X 10.4.2 (2.0 Ghz G5 3.0) 9.01 seconds under Microsoft .NET 1.1.4322, Windows XP Pro SP2 (2.0 Ghz Athlon 64 3200+) If you look at the benchmark code, it uses RijndaelManaged to do encrypt/decrypt. This class is supposedly 100% managed code in the Microsoft implementation. Included in the tarball is some native code that links against OpenSSL to do the same thing. This is what native performance for the same sized chunks looks like: 1.67 seconds under OpenSSL 0.9.7a, Linux 2.6.9 (1.8 Ghz Athlon XP 2500+) 1.44 seconds under OpenSSL 0.9.7, OS X 10.4.2 (1.67 Ghz G4 1.2) 1.05 seconds under OpenSSL 0.9.7, OS X 10.4.2 (2.0 Ghz G5 3.0) .67 seconds under OpenSSL 0.9.7a, Linux 2.6.9 (2.2 Ghz Athlon 64 3200+) To be fair, the native implementation is able to take advantage of 64- bit processors when available, while all mono builds in the above benchmarks are 32-bit. The Windows XP machine is the standard 32-bit install, even though the processor is 64-bit. This is a pretty informal benchmark, but all I'm interested in showing here is how bad the AES performance under mono is. It was suggested in #mono that I try compiling the mono AES implementation under VS.NET and run it under the Microsoft VM to compare performance.. The resulting project is available here: http://strangecargo.org/~allan/mono/AESSpeedTest.zip The same operation benchmarks thusly: 22.76 seconds under Microsoft .NET 1.1.4322, Windows XP Pro SP2 (2.0 Ghz Athlon 64 3200+) The AES code is taken from mono svn, so it may be different from the code used in the mono 1.1.8.1 benchmarks above. While switching to the Microsoft VM boosts speed significantly, it looks like significant gains could be made by optimizing the mono RijndaelManaged code. (some insightful comment would go here if I weren't so tired of writing this email). -Allan everything below doesn't matter so much, since it was based on information gathered from unoptimized JIT output Information on how to read Shark analysis comes with Shark (available for free from the Apple Developer Connection website). A direct pointer to the doc would be useful. Unfortunately, I can't find a copy of the documentation that's available online (otherwise, I would have linked it). The closest thing I can find to online documentation is this document: http:// developer.apple.com/tools
[Mono-devel-list] poor PPC JIT output
Code generated by the PPC code emitter performs very poorly in comparison to the same code emitted for other platforms (most notably, x86). I had a brief conversation about this with Miguel in #mono today and he suggested that I post some examples. Preliminary profiling with Shark (a profiling tool that is part of the Apple CHUD tools) shows some heinously inefficient JIT output on both G4 and G5 machines. Here's some sample Shark analysis on the code emitted by mono 1.1.8.1 from System.Security.Cryptography.RijndaelTransform.ECB(byte[], byte[]) and System.Security.Cryptography.RijndaelTransform.ShiftRow(bool): http://strangecargo.org/~allan/mono/ Information on how to read Shark analysis comes with Shark (available for free from the Apple Developer Connection website). (A summary: numerous and frequent pipeline stalls, unoptimized loops). Is there any active effort to optimize the PPC code emitter? The above two methods account for the majority of CPU time on a pegged 2Ghz G5 while decrypting AES blocks coming off the wire. The x86 machine encrypting the data (also running mono) doesn't even break a sweat. -Allan -- Allan Hsu allan at counterpop dot net 1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779 ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list