Title: RE: [Mono-list] Performance / array access

Also significant optimizations that MS does here are frame-pointer omission and passing params in registers.

Tom, can you try your tests on MS with the following change: ?

-             for(int i=0; i< size; i++)
---
+             for(int i=0; i< array.Length; i++)

Piers.

> -----Original Message-----
> From: Tom Fransen [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, January 19, 2003 8:02 AM
> To: Mitchell Skinner
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Mono-list] Performance / array access
>
>
> Hello all,
>
> Mitch, to come back on the issue of bound checking. I looked
> further into the problem and my first conclusion was wrong.
> Piers already stated that he did not expect that the MS
> compiler would be able to optimize all these check out of the loop.
>
> I again started looking into the assembly code and although
> I'm not a assembly programmer (it has been too long since I
> done some 6502/Z80 :-) I think the performance difference
> (>2) comes from the optimization performered by the MS JIT
> compiler (I am using version 1.0 of the SDK).
>
> The MS JIT seems to be better at allocating processor
> registers to certain variables resulting in much faster code.
> The execution speed of the Mono code is in the same order as
> the unoptimized code generated by the MS JIT compiler (for
> tight loops accessing arrays). In case of my bubble sorting
> algorithm this results in a speed difference of a factor 3. I
> have made other tests like applying matrix multiplications on
> a 2-dimensional array (to simulate an image processing
> operation) and also here Mono lacks behind. I haven't sorted
> this out but my first guess is that its again the register
> optimization that makes the difference.
>
> For the code snippets below both the MS and Mono code is given:
>
>        private void TestMethod(int size)
>         {
>             array = new int[size];
>             for(int i=0; i< size; i++)
>             {
>                 array[i] = size - i;
>             }
>             Console.WriteLine("Done" );
>             string s = Console.ReadLine();
>         }
>
>  The MS JIT generates the following code. Note that
>  I first run the program and latter when the program reads
>  from the console attach the debugger. If I run the program
> from  the MS development environment the JIT seems to
> generate unoptimized  code so I need this trick.
>
>
>
>  MS Code generated by the JIT unoptimized
> =========================================
>
>
>         private void TestMethod(int size)
>         {
>             array = new int[size];
> 00000000  push        ebp
> 00000001  mov         ebp,esp
> 00000003  sub         esp,14h
> 00000006  push        edi
> 00000007  push        esi
> 00000008  push        ebx
> 00000009  mov         ebx,ecx
> 0000000b  mov         esi,edx
> 0000000d  xor         edi,edi
> 0000000f  mov         dword ptr [ebp-10h],0
> 00000016  test        esi,80000000h
> 0000001c  je          00000025
> 0000001e  xor         ecx,ecx
> 00000020  call        763A2CFF
> 00000025  mov         edx,esi
> 00000027  mov         ecx,2DE6B3Ah
> 0000002c  call        FD4A2040
> 00000031  lea         edx,[ebx+4]
> 00000034  call        76479768
>             for(int i=0; i< size; i++)
> 00000039  xor         edi,edi
> 0000003b  nop
> 0000003c  jmp         00000056
>                 array[i] = size - i;
> 0000003e  mov         eax,dword ptr [ebx+4]    <==== Start
> loop, fetch the
> size of the array
> 00000041  cmp         edi,dword ptr [eax+4]    <==== Check if
> within bounds
> 00000044  jb          0000004D
> 00000046  xor         ecx,ecx
> 00000048  call        762EED87                       <====
> Trigger exception
> 0000004d  mov         edx,esi                        <====
> esi contain variable size
> 0000004f  sub         edx,edi
> 00000051  mov         dword ptr [eax+edi*4+8],edx
>             for(int i=0; i< size; i++)
> 00000055  inc         edi                    <==== edi
> contain variable i
> 00000056  cmp         edi,esi
> 00000058  jl          0000003E                 <==== Jump to
> start of for
> loop
>             Console.WriteLine("Done" );
> 0000005a  mov         ecx,dword ptr ds:[01C40178h]
> 00000060  call        76AA3250
>             string s = Console.ReadLine();
> 00000065  call        76AA2F10
> 0000006a  mov         dword ptr [ebp-14h],eax
> 0000006d  mov         eax,dword ptr [ebp-14h]
> 00000070  mov         dword ptr [ebp-10h],eax
>         }
> 00000073  nop
> 00000074  pop         ebx
> 00000075  pop         esi
> 00000076  pop         edi
> 00000077  mov         esp,ebp
> 00000079  pop         ebp
> 0000007a  ret
>
>
> MS Code generated by the JIT optimized
> ======================================
>
>
>         private void TestMethod(int size)
>         {
>             array = new int[size];
> 00000000  push        edi
> 00000001  push        esi
> 00000002  mov         edi,ecx
> 00000004  mov         esi,edx
> 00000006  test        esi,80000000h
> 0000000c  jne         0000005C
> 0000000e  mov         edx,esi
> 00000010  mov         ecx,2DE6A22h
> 00000015  call        FD5B2098
> 0000001a  lea         edx,[edi+4]
> 0000001d  call        765897C0
>             for(int i=0; i< size; i++)
> 00000022  xor         ecx,ecx
> 00000024  cmp         esi,0
> 00000027  jle         00000040
> 00000029  mov         edx,dword ptr [edi+4]
> 0000002c  mov         edi,dword ptr [edx+4]
>                 array[i] = size - i;
> 0000002f  cmp         ecx,edi                         <===
> start of loop
> 00000031  jae         00000064                               
> <=== index out of bounds
> 00000033  mov         eax,esi                           <===
> esi contain
> variable size
> 00000035  sub         eax,ecx
> 00000037  mov         dword ptr [edx+ecx*4+8],eax
>             for(int i=0; i< size; i++)
> 0000003b  inc         ecx                             <====
> ecx contain variable i
> 0000003c  cmp         ecx,esi
> 0000003e  jl          0000002F                               
> <==== jump to start of for loop
>             Console.WriteLine("Done" );
> 00000040  mov         ecx,dword ptr ds:[01C4086Ch]
> 00000046  mov         edx,dword ptr ds:[01C40110h]
> 0000004c  mov         eax,dword ptr [ecx]
> 0000004e  call        dword ptr [eax+000000D8h]
>             string s = Console.ReadLine();
> 00000054  call        76BB2F68
> 00000059  pop         esi
>         }
> 0000005a  pop         edi
> 0000005b  ret
> 0000005c  xor         ecx,ecx
> 0000005e  call        764B2D57
> 00000063  int         3
> 00000064  xor         ecx,ecx                                
> <==== Exception
> 00000066  call        763FEDDF
> 0000006b  int         3
>
>
> The Mono (0.18) JIT generates the following code (--dump-asm)
> =============================================================
>
> Disassembly of section .text:
>
> 00000000 <SpeedBenchmarks.BubbleSort_TestMethod>:
>    0: 55                      push   %ebp
>    1: 8b ec                   mov    %esp,%ebp
>    3: 53                      push   %ebx
>    4: 56                      push   %esi
>    5: 83 ec 28                sub    $0x28,%esp
>    8: c7 45 d8 00 00 00 00    movl   $0x0,0xffffffd8(%ebp)
>    f: 8b 45 08                mov    0x8(%ebp),%eax
>   12: 05 08 00 00 00          add    $0x8,%eax
>   17: 83 38 00                cmpl   $0x0,(%eax)
>   1a: 8b 4d 0c                mov    0xc(%ebp),%ecx
>   1d: 50                      push   %eax
>   1e: 51                      push   %ecx
>   1f: 52                      push   %edx
>   20: 51                      push   %ecx
>   21: 68 54 7e 18 08          push   $0x8187e54
>   26: e8 b5 c2 e4 ff          call   ffe4c2e0
> <SpeedBenchmarks.BubbleSort_TestMethod+0xffe4c2e0>
>   2b: 83 c4 08                add    $0x8,%esp
>   2e: 5a                      pop    %edx
>   2f: 59                      pop    %ecx
>   30: 8b c8                   mov    %eax,%ecx
>   32: 58                      pop    %eax
>   33: 89 08                   mov    %ecx,(%eax)
>   35: be 00 00 00 00          mov    $0x0,%esi
> <=== esi contain variable i
>   3a: e9 29 00 00 00          jmp    68
> <SpeedBenchmarks.BubbleSort_TestMethod+0x68>
>   3f: 8b 45 08                mov    0x8(%ebp),%eax
> <=== start loop
>   42: 8b 40 08                mov    0x8(%eax),%eax
> \
>   45: 8b ce                   mov    %esi,%ecx
> |
>   47: 3b 48 0c                cmp    0xc(%eax),%ecx
> | Bounds check
>   4a: 72 0a                   jb     56
> <SpeedBenchmarks.BubbleSort_TestMethod+0x56>/
>   4c: 68 e1 26 11 08          push   $0x81126e1
>   51: e8 ce 56 ed ff          call   ffed5724
> <SpeedBenchmarks.BubbleSort_TestMethod+0xffed5724> <===
> Generate exception
>   56: 8d 44 88 10             lea    0x10(%eax,%ecx,4),%eax
>   5a: 8b 4d 0c                mov    0xc(%ebp),%ecx
> <== retrieve size
>   5d: 2b ce                   sub    %esi,%ecx
>   5f: 89 08                   mov    %ecx,(%eax)
>   61: 8b c6                   mov    %esi,%eax
> <=== move i to eax
>   63: 40                      inc    %eax                    
>                       <=== i++
>   64: 8b d8                   mov    %eax,%ebx
> <=== (detour??)
>   66: 8b f3                   mov    %ebx,%esi
> <=== move i back to esi
>   68: 8b 45 0c                mov    0xc(%ebp),%eax
> <=== retrieve size
>   6b: 3b f0                   cmp    %eax,%esi
> <=== check if i < size
>   6d: 0f 8c cc ff ff ff       jl     3f
> <SpeedBenchmarks.BubbleSort_TestMethod+0x3f>  <=== end loop
>   73: 68 38 da 16 08          push   $0x816da38
>   78: 8b 05 78 df 16 08       mov    0x816df78,%eax
>   7e: 50                      push   %eax
>   7f: 83 38 00                cmpl   $0x0,(%eax)
>   82: 8b 00                   mov    (%eax),%eax
>   84: ff 90 a0 00 00 00       call   *0xa0(%eax)
>   8a: 83 c4 08                add    $0x8,%esp
>   8d: 8b 05 78 df 16 08       mov    0x816df78,%eax
>   93: 50                      push   %eax
>   94: 83 38 00                cmpl   $0x0,(%eax)
>   97: 8b 00                   mov    (%eax),%eax
>   99: ff 90 c0 00 00 00       call   *0xc0(%eax)
>   9f: 83 c4 04                add    $0x4,%esp
>   a2: 8b 05 80 df 16 08       mov    0x816df80,%eax
>   a8: 50                      push   %eax
>   a9: 83 38 00                cmpl   $0x0,(%eax)
>   ac: 8b 00                   mov    (%eax),%eax
>   ae: ff 90 7c 00 00 00       call   *0x7c(%eax)
>   b4: 83 c4 04                add    $0x4,%esp
>   b7: 8b f0                   mov    %eax,%esi
>   b9: 8b de                   mov    %esi,%ebx
>   bb: 8d 65 f8                lea    0xfffffff8(%ebp),%esp
>   be: 5e                      pop    %esi
>   bf: 5b                      pop    %ebx
>   c0: c9                      leave
>   c1: c3                      ret
>
>
> -----Original Message-----
> From: Mitchell Skinner [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, January 15, 2003 12:42 AM
> To: Tom Fransen
> Cc: [EMAIL PROTECTED]
> Subject: RE: [Mono-list] Performance / array access
>
>
> Hello,
>
> The runtime inserts array bounds checks to prevent things
> like buffer overflows.  So the following code:
>    int len = array.Length;
>    for (int i = 0; i < len; ++i)
>       sum += array [i];
> becomes something like:
>    int len = array.Length;
>    for (int i = 0; i < len; ++i) {
>       if (i >= array.Length) throw new Exception; //forgot which
>       sum += array [i];
>    }
>
> In the fast case, the MS JIT is smart enough to analyze the
> following code and figure out that "i" will not go out of the
> bounds of the array (because i starts at zero and stays below
> array.Length):
>    for (int i = 0; i < array.Length; ++i)
>       sum += array [i];
>
> If the for loop isn't set up exactly like that, I don't think
> the MS JIT is able to eliminate the bounds check (unless
> perhaps there are constants involved).  Is the code you have
> below (comparing i with a "size" variable) the exact code you
> tested?  If "size" is not a constant, I would be suprised to
> find that there was a major difference between MS and mono.
>
> Are you using the MS 1.0 release or the 1.1 release?
>
> Mitch
>
>
> On Tue, 2003-01-14 at 13:38, Tom Fransen wrote:
> > Piers,
> >
> > can you spend a few more words on this. Why is the second
> case slower?
> >
> > Furthermore if I use a simple loop to fill an array the
> > speed difference is a factor 3. I have a bubble sort method that I
> > execute a larger number of times. On Mono it takes 19
> seconds on the
> > MS stuff ~6 seconds. This is a big difference. So why I am losing a
> > factor 3 on the following loop.
> >
> >                 // Fill aray worst case, all elements need
> to be swapped
> >                 for(int i=0; i< size; i++)
> >                 {
> >                     array[i] = size - i;
> >                 }
> >
> > I am testing with some small benchmarks, but real
> applications (e.g.
> > an
> MP3
> > decoder)
> > often use tables stored in arrays to do certain calculations. So
> > although the benchmark maybe syntetic certain application
> will suffer
> > from these penalties.
> >
> > regards,
> > Tom
> >
> > -----Original Message-----
> > From: Miguel de Icaza [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, January 13, 2003 3:22 AM
> > To: Piers Haken
> > Cc: Tom Fransen; [EMAIL PROTECTED]
> > Subject: RE: [Mono-list] Performance / array access
> >
> >
> > Hello,
> >
> > > Yeah, Microsoft's JIT lifts invariant bounds-checks. But
> I believe
> > > it's pretty limited.
> > >
> > > For example, the check is removed in the following case:
> > >
> > >   for (int i = 0; i < array.Length; ++i)
> > >     sum += array [i];
> > >
> > > But not here:
> > >
> > >   int len = array.Length;
> > >   for (int i = 0; i < len; ++i)
> > >     sum += array [i];
> > >
> > > So the first case is (counter-intuitively) faster than the second.
> > >
> > > I don't believe Mono's JIT makes this optimization. Maybe the new
> > > JIT will ;-)
> >
> > This is a very good observation.  Because it seems that this
> > particular kind of loop is detected by the JIT engine.
> >
> > Array-bounds-check elimination is something we want to do
> with the new
> > JIT, but it is not implemented at this point.  The new JIT
> features a
> > new intermediate representation that simplifies
> implementing this sort
> > of thing, but it is still on its bootstrapping phases of life.
> >
> > Miguel
> >
> >
> > _______________________________________________
> > Mono-list maillist  -  [EMAIL PROTECTED]
> > http://lists.ximian.com/mailman/listinfo/mono-list
> --
> Mitchell Skinner <[EMAIL PROTECTED]>
>
>
> _______________________________________________
> Mono-list maillist  -  [EMAIL PROTECTED]
> http://lists.ximian.com/mailman/listinfo/mono-list
>

Reply via email to