Also significant optimizations that MS does here are frame-pointer omission and passing params in registers.
Tom, can you try your tests on MS with the following change: ?
- for(int i=0; i< size; i++)
---
+ for(int i=0; i< array.Length; i++)
Piers.
> -----Original Message-----
> From: Tom Fransen [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, January 19, 2003 8:02 AM
> To: Mitchell Skinner
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Mono-list] Performance / array access
>
>
> Hello all,
>
> Mitch, to come back on the issue of bound checking. I looked
> further into the problem and my first conclusion was wrong.
> Piers already stated that he did not expect that the MS
> compiler would be able to optimize all these check out of the loop.
>
> I again started looking into the assembly code and although
> I'm not a assembly programmer (it has been too long since I
> done some 6502/Z80 :-) I think the performance difference
> (>2) comes from the optimization performered by the MS JIT
> compiler (I am using version 1.0 of the SDK).
>
> The MS JIT seems to be better at allocating processor
> registers to certain variables resulting in much faster code.
> The execution speed of the Mono code is in the same order as
> the unoptimized code generated by the MS JIT compiler (for
> tight loops accessing arrays). In case of my bubble sorting
> algorithm this results in a speed difference of a factor 3. I
> have made other tests like applying matrix multiplications on
> a 2-dimensional array (to simulate an image processing
> operation) and also here Mono lacks behind. I haven't sorted
> this out but my first guess is that its again the register
> optimization that makes the difference.
>
> For the code snippets below both the MS and Mono code is given:
>
> private void TestMethod(int size)
> {
> array = new int[size];
> for(int i=0; i< size; i++)
> {
> array[i] = size - i;
> }
> Console.WriteLine("Done" );
> string s = Console.ReadLine();
> }
>
> The MS JIT generates the following code. Note that
> I first run the program and latter when the program reads
> from the console attach the debugger. If I run the program
> from the MS development environment the JIT seems to
> generate unoptimized code so I need this trick.
>
>
>
> MS Code generated by the JIT unoptimized
> =========================================
>
>
> private void TestMethod(int size)
> {
> array = new int[size];
> 00000000 push ebp
> 00000001 mov ebp,esp
> 00000003 sub esp,14h
> 00000006 push edi
> 00000007 push esi
> 00000008 push ebx
> 00000009 mov ebx,ecx
> 0000000b mov esi,edx
> 0000000d xor edi,edi
> 0000000f mov dword ptr [ebp-10h],0
> 00000016 test esi,80000000h
> 0000001c je 00000025
> 0000001e xor ecx,ecx
> 00000020 call 763A2CFF
> 00000025 mov edx,esi
> 00000027 mov ecx,2DE6B3Ah
> 0000002c call FD4A2040
> 00000031 lea edx,[ebx+4]
> 00000034 call 76479768
> for(int i=0; i< size; i++)
> 00000039 xor edi,edi
> 0000003b nop
> 0000003c jmp 00000056
> array[i] = size - i;
> 0000003e mov eax,dword ptr [ebx+4] <==== Start
> loop, fetch the
> size of the array
> 00000041 cmp edi,dword ptr [eax+4] <==== Check if
> within bounds
> 00000044 jb 0000004D
> 00000046 xor ecx,ecx
> 00000048 call 762EED87 <====
> Trigger exception
> 0000004d mov edx,esi <====
> esi contain variable size
> 0000004f sub edx,edi
> 00000051 mov dword ptr [eax+edi*4+8],edx
> for(int i=0; i< size; i++)
> 00000055 inc edi <==== edi
> contain variable i
> 00000056 cmp edi,esi
> 00000058 jl 0000003E <==== Jump to
> start of for
> loop
> Console.WriteLine("Done" );
> 0000005a mov ecx,dword ptr ds:[01C40178h]
> 00000060 call 76AA3250
> string s = Console.ReadLine();
> 00000065 call 76AA2F10
> 0000006a mov dword ptr [ebp-14h],eax
> 0000006d mov eax,dword ptr [ebp-14h]
> 00000070 mov dword ptr [ebp-10h],eax
> }
> 00000073 nop
> 00000074 pop ebx
> 00000075 pop esi
> 00000076 pop edi
> 00000077 mov esp,ebp
> 00000079 pop ebp
> 0000007a ret
>
>
> MS Code generated by the JIT optimized
> ======================================
>
>
> private void TestMethod(int size)
> {
> array = new int[size];
> 00000000 push edi
> 00000001 push esi
> 00000002 mov edi,ecx
> 00000004 mov esi,edx
> 00000006 test esi,80000000h
> 0000000c jne 0000005C
> 0000000e mov edx,esi
> 00000010 mov ecx,2DE6A22h
> 00000015 call FD5B2098
> 0000001a lea edx,[edi+4]
> 0000001d call 765897C0
> for(int i=0; i< size; i++)
> 00000022 xor ecx,ecx
> 00000024 cmp esi,0
> 00000027 jle 00000040
> 00000029 mov edx,dword ptr [edi+4]
> 0000002c mov edi,dword ptr [edx+4]
> array[i] = size - i;
> 0000002f cmp ecx,edi <===
> start of loop
> 00000031 jae 00000064
> <=== index out of bounds
> 00000033 mov eax,esi <===
> esi contain
> variable size
> 00000035 sub eax,ecx
> 00000037 mov dword ptr [edx+ecx*4+8],eax
> for(int i=0; i< size; i++)
> 0000003b inc ecx <====
> ecx contain variable i
> 0000003c cmp ecx,esi
> 0000003e jl 0000002F
> <==== jump to start of for loop
> Console.WriteLine("Done" );
> 00000040 mov ecx,dword ptr ds:[01C4086Ch]
> 00000046 mov edx,dword ptr ds:[01C40110h]
> 0000004c mov eax,dword ptr [ecx]
> 0000004e call dword ptr [eax+000000D8h]
> string s = Console.ReadLine();
> 00000054 call 76BB2F68
> 00000059 pop esi
> }
> 0000005a pop edi
> 0000005b ret
> 0000005c xor ecx,ecx
> 0000005e call 764B2D57
> 00000063 int 3
> 00000064 xor ecx,ecx
> <==== Exception
> 00000066 call 763FEDDF
> 0000006b int 3
>
>
> The Mono (0.18) JIT generates the following code (--dump-asm)
> =============================================================
>
> Disassembly of section .text:
>
> 00000000 <SpeedBenchmarks.BubbleSort_TestMethod>:
> 0: 55 push %ebp
> 1: 8b ec mov %esp,%ebp
> 3: 53 push %ebx
> 4: 56 push %esi
> 5: 83 ec 28 sub $0x28,%esp
> 8: c7 45 d8 00 00 00 00 movl $0x0,0xffffffd8(%ebp)
> f: 8b 45 08 mov 0x8(%ebp),%eax
> 12: 05 08 00 00 00 add $0x8,%eax
> 17: 83 38 00 cmpl $0x0,(%eax)
> 1a: 8b 4d 0c mov 0xc(%ebp),%ecx
> 1d: 50 push %eax
> 1e: 51 push %ecx
> 1f: 52 push %edx
> 20: 51 push %ecx
> 21: 68 54 7e 18 08 push $0x8187e54
> 26: e8 b5 c2 e4 ff call ffe4c2e0
> <SpeedBenchmarks.BubbleSort_TestMethod+0xffe4c2e0>
> 2b: 83 c4 08 add $0x8,%esp
> 2e: 5a pop %edx
> 2f: 59 pop %ecx
> 30: 8b c8 mov %eax,%ecx
> 32: 58 pop %eax
> 33: 89 08 mov %ecx,(%eax)
> 35: be 00 00 00 00 mov $0x0,%esi
> <=== esi contain variable i
> 3a: e9 29 00 00 00 jmp 68
> <SpeedBenchmarks.BubbleSort_TestMethod+0x68>
> 3f: 8b 45 08 mov 0x8(%ebp),%eax
> <=== start loop
> 42: 8b 40 08 mov 0x8(%eax),%eax
> \
> 45: 8b ce mov %esi,%ecx
> |
> 47: 3b 48 0c cmp 0xc(%eax),%ecx
> | Bounds check
> 4a: 72 0a jb 56
> <SpeedBenchmarks.BubbleSort_TestMethod+0x56>/
> 4c: 68 e1 26 11 08 push $0x81126e1
> 51: e8 ce 56 ed ff call ffed5724
> <SpeedBenchmarks.BubbleSort_TestMethod+0xffed5724> <===
> Generate exception
> 56: 8d 44 88 10 lea 0x10(%eax,%ecx,4),%eax
> 5a: 8b 4d 0c mov 0xc(%ebp),%ecx
> <== retrieve size
> 5d: 2b ce sub %esi,%ecx
> 5f: 89 08 mov %ecx,(%eax)
> 61: 8b c6 mov %esi,%eax
> <=== move i to eax
> 63: 40 inc %eax
> <=== i++
> 64: 8b d8 mov %eax,%ebx
> <=== (detour??)
> 66: 8b f3 mov %ebx,%esi
> <=== move i back to esi
> 68: 8b 45 0c mov 0xc(%ebp),%eax
> <=== retrieve size
> 6b: 3b f0 cmp %eax,%esi
> <=== check if i < size
> 6d: 0f 8c cc ff ff ff jl 3f
> <SpeedBenchmarks.BubbleSort_TestMethod+0x3f> <=== end loop
> 73: 68 38 da 16 08 push $0x816da38
> 78: 8b 05 78 df 16 08 mov 0x816df78,%eax
> 7e: 50 push %eax
> 7f: 83 38 00 cmpl $0x0,(%eax)
> 82: 8b 00 mov (%eax),%eax
> 84: ff 90 a0 00 00 00 call *0xa0(%eax)
> 8a: 83 c4 08 add $0x8,%esp
> 8d: 8b 05 78 df 16 08 mov 0x816df78,%eax
> 93: 50 push %eax
> 94: 83 38 00 cmpl $0x0,(%eax)
> 97: 8b 00 mov (%eax),%eax
> 99: ff 90 c0 00 00 00 call *0xc0(%eax)
> 9f: 83 c4 04 add $0x4,%esp
> a2: 8b 05 80 df 16 08 mov 0x816df80,%eax
> a8: 50 push %eax
> a9: 83 38 00 cmpl $0x0,(%eax)
> ac: 8b 00 mov (%eax),%eax
> ae: ff 90 7c 00 00 00 call *0x7c(%eax)
> b4: 83 c4 04 add $0x4,%esp
> b7: 8b f0 mov %eax,%esi
> b9: 8b de mov %esi,%ebx
> bb: 8d 65 f8 lea 0xfffffff8(%ebp),%esp
> be: 5e pop %esi
> bf: 5b pop %ebx
> c0: c9 leave
> c1: c3 ret
>
>
> -----Original Message-----
> From: Mitchell Skinner [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, January 15, 2003 12:42 AM
> To: Tom Fransen
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Mono-list] Performance / array access
>
>
> Hello,
>
> The runtime inserts array bounds checks to prevent things
> like buffer overflows. So the following code:
> int len = array.Length;
> for (int i = 0; i < len; ++i)
> sum += array [i];
> becomes something like:
> int len = array.Length;
> for (int i = 0; i < len; ++i) {
> if (i >= array.Length) throw new Exception; //forgot which
> sum += array [i];
> }
>
> In the fast case, the MS JIT is smart enough to analyze the
> following code and figure out that "i" will not go out of the
> bounds of the array (because i starts at zero and stays below
> array.Length):
> for (int i = 0; i < array.Length; ++i)
> sum += array [i];
>
> If the for loop isn't set up exactly like that, I don't think
> the MS JIT is able to eliminate the bounds check (unless
> perhaps there are constants involved). Is the code you have
> below (comparing i with a "size" variable) the exact code you
> tested? If "size" is not a constant, I would be suprised to
> find that there was a major difference between MS and mono.
>
> Are you using the MS 1.0 release or the 1.1 release?
>
> Mitch
>
>
> On Tue, 2003-01-14 at 13:38, Tom Fransen wrote:
> > Piers,
> >
> > can you spend a few more words on this. Why is the second
> case slower?
> >
> > Furthermore if I use a simple loop to fill an array the
> > speed difference is a factor 3. I have a bubble sort method that I
> > execute a larger number of times. On Mono it takes 19
> seconds on the
> > MS stuff ~6 seconds. This is a big difference. So why I am losing a
> > factor 3 on the following loop.
> >
> > // Fill aray worst case, all elements need
> to be swapped
> > for(int i=0; i< size; i++)
> > {
> > array[i] = size - i;
> > }
> >
> > I am testing with some small benchmarks, but real
> applications (e.g.
> > an
> MP3
> > decoder)
> > often use tables stored in arrays to do certain calculations. So
> > although the benchmark maybe syntetic certain application
> will suffer
> > from these penalties.
> >
> > regards,
> > Tom
> >
> > -----Original Message-----
> > From: Miguel de Icaza [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, January 13, 2003 3:22 AM
> > To: Piers Haken
> > Cc: Tom Fransen; [EMAIL PROTECTED]
> > Subject: RE: [Mono-list] Performance / array access
> >
> >
> > Hello,
> >
> > > Yeah, Microsoft's JIT lifts invariant bounds-checks. But
> I believe
> > > it's pretty limited.
> > >
> > > For example, the check is removed in the following case:
> > >
> > > for (int i = 0; i < array.Length; ++i)
> > > sum += array [i];
> > >
> > > But not here:
> > >
> > > int len = array.Length;
> > > for (int i = 0; i < len; ++i)
> > > sum += array [i];
> > >
> > > So the first case is (counter-intuitively) faster than the second.
> > >
> > > I don't believe Mono's JIT makes this optimization. Maybe the new
> > > JIT will ;-)
> >
> > This is a very good observation. Because it seems that this
> > particular kind of loop is detected by the JIT engine.
> >
> > Array-bounds-check elimination is something we want to do
> with the new
> > JIT, but it is not implemented at this point. The new JIT
> features a
> > new intermediate representation that simplifies
> implementing this sort
> > of thing, but it is still on its bootstrapping phases of life.
> >
> > Miguel
> >
> >
> > _______________________________________________
> > Mono-list maillist - [EMAIL PROTECTED]
> > http://lists.ximian.com/mailman/listinfo/mono-list
> --
> Mitchell Skinner <[EMAIL PROTECTED]>
>
>
> _______________________________________________
> Mono-list maillist - [EMAIL PROTECTED]
> http://lists.ximian.com/mailman/listinfo/mono-list
>
