On Mon, Nov 08, 2004 at 11:32:24AM -0800, Ian Romanick wrote: | This is something I've been thinking about ever since I saw the | profiling tools in Nvidia's drivers at SIGGRAPH. There's a LOT of | information that would be useful to get out of the driver about performance
Have you taken a look at the SGIX_instruments extension? It provides a framework that's intended for gathering profiling information asynchronously. The idea was that you'd add separate extensions that defined the actual instrumentation (SGIX_ir_instrument1 was an early example). I searched my archives for things I'd written on this subject in the past. The following is probably the most comprehensive summary. Some of it's out-of-date now, or has implications for hardware design that's out of our control, but some of it still looks useful. Allen Purposes of Instrumentation Tuning Analyzing the app or database to improve overall performance and/or rendering quality. Typically done during the development phase. Examples: determining what percentage of triangles are clipped, or how well texture memory is utilized. Load Monitoring Gathering information to modify the behavior of the app or the structure of the database dynamically, to maintain a constant frame-rate. Typically done in real-time by production apps. Examples: determining how much time is spent in geometric processing and how much time in pixel-fill, in order to choose object level-of-detail. Debugging/Testing Graphics systems are extremely complex, and their behavior isn't always predictable. We can anticipate a need for machine-specific instrumentation in order to understand surprisingly high or low performance of an application, or for use during driver development. Infrastructure The SGIX_instruments extension provides scaffolding for pipeline instrumentation. The framework allows the app to: Specify a buffer into which measurements will be delivered (asynchronously) by the pipe. Enable/disable an arbitrary collection of instruments. Start/stop/snapshot measurements by the currently-enabled set of instruments. Label a measurement with a user-selectable marker. Poll or wait for completion of a particular measurement. We must write one or more new extensions to define instruments that fit into the SGIX_instruments framework. This outline sketches some of the instruments that might be appropriate. Since some measurements are performed by real-time apps, it's important to keep the overhead low. The asynchronous delivery scheme helps with this, but it's also desirable to keep other issues in mind (for example, avoid flushing the pipe if at all possible). Suggested Instruments Rendering Statistics Number of bytes of data sent to pipe Number of bytes of data sent from pipe These are used to identify data transfer bottlenecks arising from geometry-path commands, pixel-path commands, and texture management. Number of geometric primitives sent to pipe Number of geometric primitives trivially accepted or rejected Number of geometric primitives subjected to 3D clipping Number of geometric primitives resulting from 3D clipping Number of geometric primitives face-culled Number of matrix ops sent to pipe These measure culling effectiveness and determine the cause of geometry-processing bottlenecks (e.g., too many vertices, too much clipping, or too many attribute changes). Number of DrawPixels commands sent to pipe Number of Bitmap commands sent to pipe Number of ReadPixels commands sent to pipe Number of CopyPixels commands sent to the pipe Together with the data transfer statistics, these help determine whether pixel-oriented apps are running into data transfer or pixel operation setup bottlenecks. Number of MakeCurrent/MakeCurrentRead commands executed This should help determine when apps are using more than the optimal number of contexts, and thus causing an inordinate number of context switches. Number of fragments generated, for each rasterizer Number of fragments passing depth test, for each rasterizer Together with other statistics, these help estimate average triangle size, depth complexity, and effectiveness of depth sorting. Open Issues: Is there a way to track the number of bytes processed by CopyPixels-style operations? These aren't accounted-for by the transfers to and from the pipe. Texture Statistics Number of texture binds performed Pinpoints an important attribute-change bottleneck. Number of TexImage/TexSubImage commands Number of CopyTexImage/CopyTexSubImage commands Number of texture downloads initiated by texture manager Number of GetTexImage commands Number of texture uploads initiated by texture manager Together with other stats, determines cost of texture management operations. Texture memory utilization Initial/Max/Min/Final fraction of texture memory in use over the measurement interval. Open issues: Number of texture fetches, per rasterizer? Timing Measurements Return these times for all commands appearing between two ``bracketing'' commands issued by the app: Host CPU time (usecs) Geometry (total for vector and scalar units) processing time (usecs) Rasterization (for each rasterizer) processing time (usecs) Wall clock time (usecs) Note that the above measurements should reflect the ``useful work'' performed by the associated pipe stages; they should be repeatable no matter what is in the pipe before the first bracketing command is issued and no matter what is placed in the pipe after the second bracketing command is issued. (Thus, counting FIFO full/empty states isn't sufficient.) Instruments NOT Recommended Number of FIFO high-water interrupts Not sure this is needed. Provided we do a good job of accounting for time spent in each stage of the pipe, that accounting should be of more use than the raw number of interrupts, and interpreting it should involve less system-dependent code. Number of graphics context switches Superseded by recording the number of MakeCurrent commands (which should be more useful on a per-context basis than the global number of context switches per pipe). Number of geometric primitives scissored See note under Issues/Resolutions below. Number of bytes transferred due to DrawPixel/Bitmap commands Number of bytes transferred due to ReadPixel commands Number of bytes transferred due to CopyPixels commands Number of bytes of texture data transferred as a result of TexImage, CopyTexImage, GetTexImage, etc. These seem reasonable, but I suspect we'll get adequate bang-for-the-buck just by counting the number of bytes transferred to/from the pipe. (Tracking bytes transferred for Copy* operations is an open issue.) Coarse Z-culling stats of some kind? My current guess is that if we can provide statistics on number of fragments generated and the number of fragments passing the depth test, it's unlikely we'll need more stats on coarse Z-culling. Issues/Resolutions In principle, the application can handle some of the measurements described above (counting the number of times a given command is executed, for example). Should we bother implementing instruments to capture such measurements? I believe we should. Although it makes good design sense to avoid duplicating what's easily accomplished in the apps, there are two problems with requiring users to make measurements on their own: (1) Doing so could require wholesale changes to source code. (Consider what would be needed to handle display lists correctly.) It's unlikely many users would do this. (2) Users typically don't have access to the source code for high-level libraries that issue OpenGL commands, so requiring source code changes makes it impractical for them to measure the commands executed by those libraries. Why not use a library like GLS or a utility like ogldebug to trace OpenGL commands and make such measurements? Good arguments have been made for this, but I'm not completely convinced. In some cases, using GLS or ogldebug mitigates the problems mentioned above. For example, it would be easier to maintain counts of the number of times a command is executed, since no access to source code is needed. (Handling display lists correctly seems possible, though it would require a good bit of work, especially for shared dlists.) There are problems merging the results of counts from the tracing utilities with timing measurements made by other instruments. The tracing utilities would need to interpret the instrumentation commands to know when to start and stop counting. The counts wouldn't be available to the application under test, so it couldn't make on-the-fly decisions based on them. Also, in many cases I suspect it's more work to put this functionality into the tracing utilities than it is to fold the functionality into the instruments. Counting pixel and texture commands might be accomplished with just a few lines of microcode, for example. It's difficult to measure the number of scissored geometric primitives, because a primitive may be scissored in one rasterizer but not in others. Determining which primitives have been scissored essentially requires tagging each primitive so that the status from all rasterizers can be combined meaningfully. Good point. That statistic has been dropped from the current proposal. It would be worthwhile to consider instruments that would help debug performance problems, but would not necessarily be exposed for general use. (A count of the number of cycles for which each type of memory request [texture, video, command fifo, etc.] stalls, for example.) Yes. The proposal now mentions a ``Debug/Test'' category of instruments. Beware of adding readable hardware counters, particularly when they affect multiple blocks of logic and software (consider testability, new special command packets that would be required, context switching, etc.). True. Not all of these instruments will be practical. For multiple geometry engines, some measurements will need to be maintained on a per-GE basis. The extension spec must reflect this (as it must reflect the existence of multiple rasterizers). ------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click -- _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel