Am Mittwoch, 29. Januar 2003 00:05 schrieb Ian Romanick: > Felix Kühling wrote: > > On Tue, 28 Jan 2003 13:10:41 -0800 > > > > Ian Romanick <[EMAIL PROTECTED]> wrote: > >>Felix Kühling wrote: > >>>The patch moves the load operations back to the front of the loop as in > >>>the G3TN_norm_w_lengths case. > >> > >>Good catch. It looks like this went into the Mesa tree back in October > >>of 2001...over a year ago! It looks like Andres Lewycky gave Brian some > >>bad patches. :( > > > > Yeah, but until November 2002 (DRI trunk) there was a comment in 3dnow.c > > that the 3dnow-normal code is broken and it was not used. > > D'oh!
;-) > >>I realize that AMD recommends reading memory backwards, but would a > >>quick-fix be to just use the 3Dnow! prefetch instructions? "Block Prefetch", page 18, see below. > > The prefetch instructions used are and must be 3DNow instructions. On > > Intel Prefetch was introduced with the SSE extension on the PentiumIII. > > They're not available on older Athlons and K6's. It all depends on steppings... Some output from MPlayer, best optimized OSS app I know: CPU: Advanced Micro Devices Athlon 4 PM Palomino/Athlon MP Multiprocessor/Athlon XP eXtreme Performance (Family: 6, Stepping: 2) Detected cache-line size is 64 bytes CPUflags: MMX: 1 MMX2: 1 3DNow: 1 3DNow2: 1 SSE: 1 SSE2: 0 Kompiliert für x86 CPU mit folgenden Erweiterungen: MMX MMX2 3DNow 3DNowEx SSE > > Anyway, all that > > prefetching looks odd to me. In the first transform loop in > > _mesa_3dnow_transform_normalize_normals memory is prefetched which is > > never read but only written. This is obviously useless. Then in the > > normalize loop the memory which was written before is prefetched again. > > I think this is not necessary. The array is small enough to be still in > > the cache. > > I believe that prefetchw tells the processor to warm up the cache line > because it's going to be written soon. I think the prefetching in the > first loop is probably correct. The prefetchw of (%eax) might need to > be before the add. I'd have to benchmark it. I'm not sure if I have a > 3dnow capable box around anymore. If I do, it will be an old K6-III. :) > > > I'll see if I can clean this up a bit. On the mesa-4-0-4 branch this > > code is disabled anyway, so there is not really a hurry to apply my > > stupid little patch. About this reading backward thing, where is that > > documented. I have an AMD Athlon optimization guide from February 2002 > > which doesn't mention it. > > I've seen a reference posted to dri-devel a couple times. All from me;-) > Here's a couple references the Dieter posted on 09-Jan-2003: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=103548024914815&w=2 > http://208.15.46.63/events/gdc2002.htm And here are some numbers: nuetzel/Entwicklung> ./athlon-DN 1600.081 MHz clear_page by 'normal_clear_page' took 12757 cycles (489.9 MB/s) clear_page by 'slow_zero_page' took 12478 cycles (500.9 MB/s) clear_page by 'fast_clear_page' took 9684 cycles (645.4 MB/s) clear_page by 'faster_clear_page' took 4257 cycles (1468.0 MB/s) copy_page by 'normal_copy_page' took 9063 cycles (689.6 MB/s) copy_page by 'slow_copy_page' took 9051 cycles (690.5 MB/s) copy_page by 'fast_copy_page' took 8125 cycles (769.3 MB/s) copy_page by 'faster_copy' took 5468 cycles (1143.0 MB/s) copy_page by 'even_faster' took 5538 cycles (1128.5 MB/s) copy_page by 'no_prefetch' took 4462 cycles (1400.7 MB/s) > I'm not sure if this applies to the K6 family or just to Athlons. I > suspect it may only apply to Athlons, but we may have to test it. According to AMD (see the gdc2002.htm Presentation) it applies to _all_ modern x86 CPU's out there. > >>Since these functions are globally exported, it might be worth it to > >>write a quick test that calls the various _transform_normalize_normals > >>functions to make sure that they all produces the same (or close enough) > >>results. > > > > And: > > _transform_normalize_normals_no_rot > > _transform_rescale_normals_no_rot > > _transform_rescale_normals > > _transform_normals_no_rot > > _transform_normals > > _normalize_normals > > _rescale_normals > > > > These should be tested too, while we're at it. Yes. -Dieter ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel