Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > You know what they found out with all of the hundreds of millions of > dollars > > they spent? Dedicated hardware still does it faster and cheaper. Period. > > It's just like writing a custom routine to sort an array will pretty much > > always be faster than using the generic qsort. When you hand-tune for a > > specific data set you will always get better performance. This is not to > > say that the generic implementation will not perform well or even > acceptably > > well, but only to say that it will never, ever, ever perform better. > > Here you are comparing different algorithms. A custom sort algorithm will > perform much better than a standard qsort. I agree. Implementing something > in hardware does not mean it uses a more efficient algorithm however. A > hardware implementation is just that, an implementation. It does not change > the underlying algorithms that are being used. In fact, it tends to set the > algorithm in stone. This makes it very hard to adopt new better algorithms > as they are invented. In order to move to a better algorithm you must wait > for a hardware manufacturer to implement it and then fork out more money. > > Dedicated hardware can do a limited set of things faster. There is no way > to increase its capabilities without purchasing new hardware. This is the > weakness of having dedicated hardware for very specific functionality. If a > better algorithm is invented, it can take an extremely long time for it to > be brought to market, if it is at all, and it will cost yet more money. > Software has the advantage of being able to implement new algorithms much > more quickly. If a new algorithm is found to be that much better than the > old, a software implementation of this algorithm will in fact outperform a > hardware implementation of the older algorithm. Algorithms are at least an > order of magnitude more important than the implementation itself. > > -Raystonn Yes. Choosing the correct (best) algorithm for a given problem will reduce the calculation cost with the most significance. Yes, once a piece of silicon is etched, it is 'in stone' as to the featureset it provides, and yes, if you want the latest and greatest featureset in silicon you'll always have to fork out more money. That's how it's always been, and will always be. However, none of the commodity general purpose cpus are designed for highly parallel execution of parallelizable algorithms--which just about every graphics operation is. How many pixels can a 2GHz Athlon process at a time? Usually just one. How many can dedicated silicon? Mostly limited by how many can be fetched from memory at a time. Thus, the algorithm is _not_ always an order of magnitude more important than the implementation itself--especially if a parallelized implementation can provide orders of magnitude more performance than a serial implementation of the same or an even superior algorithm. It remains a fact that in many cases where graphics algorithms are concerned, even less efficient algorithms implemented in a highly parallel fashion in specialized silicon (even _old_ silicon--voodoo2) can still significantly outperform the snazziest new algorithm implemented serially in software on even a screaming fast general purpose cpu. (see the links in the thread to the comparisons of hardware w/a voodoo2 vs software w/an athlon 1+ GHz) Nick ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > Here you are comparing different algorithms. A custom sort algorithm will > > perform much better than a standard qsort. I agree. Implementing something > > in hardware does not mean it uses a more efficient algorithm however. A > > hardware implementation is just that, an implementation. It does not change > > the underlying algorithms that are being used. In fact, it tends to set the > > algorithm in stone. This makes it very hard to adopt new better algorithms > > as they are invented. In order to move to a better algorithm you must wait > > for a hardware manufacturer to implement it and then fork out more money. > > As far as I know, every new graphics chip out there right now is programmable - it may have a limited number of operands but the microcode is certainly modifiable. They aren't just straight ASICs. The chips may (or may not, I have not double checked) be somewhat programmable, but the arrangement of the chips in the pipeline are not. Thus, the implementation of whatever algorithm they use can be tweaked somewhat, but the algorithm is pretty much hard-coded. -Raystonn ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On Thu, 11 Apr 2002 00:26:17 -0700 "Raystonn" <[EMAIL PROTECTED]> wrote: > Here you are comparing different algorithms. A custom sort algorithm will > perform much better than a standard qsort. I agree. Implementing something > in hardware does not mean it uses a more efficient algorithm however. A > hardware implementation is just that, an implementation. It does not change > the underlying algorithms that are being used. In fact, it tends to set the > algorithm in stone. This makes it very hard to adopt new better algorithms > as they are invented. In order to move to a better algorithm you must wait > for a hardware manufacturer to implement it and then fork out more money. As far as I know, every new graphics chip out there right now is programmable - it may have a limited number of operands but the microcode is certainly modifiable. They aren't just straight ASICs. David Bronaugh ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> You know what they found out with all of the hundreds of millions of dollars > they spent? Dedicated hardware still does it faster and cheaper. Period. > It's just like writing a custom routine to sort an array will pretty much > always be faster than using the generic qsort. When you hand-tune for a > specific data set you will always get better performance. This is not to > say that the generic implementation will not perform well or even acceptably > well, but only to say that it will never, ever, ever perform better. Here you are comparing different algorithms. A custom sort algorithm will perform much better than a standard qsort. I agree. Implementing something in hardware does not mean it uses a more efficient algorithm however. A hardware implementation is just that, an implementation. It does not change the underlying algorithms that are being used. In fact, it tends to set the algorithm in stone. This makes it very hard to adopt new better algorithms as they are invented. In order to move to a better algorithm you must wait for a hardware manufacturer to implement it and then fork out more money. Dedicated hardware can do a limited set of things faster. There is no way to increase its capabilities without purchasing new hardware. This is the weakness of having dedicated hardware for very specific functionality. If a better algorithm is invented, it can take an extremely long time for it to be brought to market, if it is at all, and it will cost yet more money. Software has the advantage of being able to implement new algorithms much more quickly. If a new algorithm is found to be that much better than the old, a software implementation of this algorithm will in fact outperform a hardware implementation of the older algorithm. Algorithms are at least an order of magnitude more important than the implementation itself. -Raystonn ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On Tue, Apr 09, 2002 at 06:29:54PM -0700, Raystonn wrote: > First off, current market leaders began their hardware designs back when the > main CPU was much much slower. They have an investment in this technology > and likely do not want to throw it away. Back when these companies were > founded, such 3d rendering could not be performed on the main processor at > all. The computational power of the main processor has since increased > dramatically. The algorithmic approach to 3d rendering should be reexamined > with current and future hardware in mind. What was once true may no longer > be so. > > Second, if a processor intensive algorithm was capable of better efficiency > than a bandwidth intensive algorithm, there is a good chance these > algorithms would be movd back over to the main CPU. If the main processor > took over 3D rendering, what would the 3D card manufacturers sell? It would > put them out of business essentially. Therefore you cannot gauge what is > the most efficient algorithm based on what the 3D card manufacturers decide > to push. They will push whatever is better for their bottom line and their > own future. I'm getting very tired of this thread. If modern CPUs are s much better for 3D, then why does Intel, of all companies, still make its own 3D hardware in addition to CPUs?!? If the main CPU was so wonderful for 3D rendering, Intel would be all over it. In fact, they tried to push that agenda once when MMX first became available. Remember? Had it come out before the original Voodoo Graphics, things might have been different for a time. You know what they found out with all of the hundreds of millions of dollars they spent? Dedicated hardware still does it faster and cheaper. Period. It's just like writing a custom routine to sort an array will pretty much always be faster than using the generic qsort. When you hand-tune for a specific data set you will always get better performance. This is not to say that the generic implementation will not perform well or even acceptably well, but only to say that it will never, ever, ever perform better. -- Tell that to the Marines! ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > If you want to get back to the topic of software rendering, I would be more > > than happy to oblige. > > Better yet, if you are serious - how about furthering your argument with > patches to optimise and improve Mesa's software paths? Patches will not do the job. My ideas include a change in algorithm, not implementation. This would involve a huge redesign. I have SGI's SI and have been mucking about with it attempting to bring some kind of order to it. Once I complete that I will likely start over from scratch and implement a new design based on scene-capturing and some algorithms I have created. Of course I will ensure it passes conformance tests in the end. What good is an OpenGL implementation if it does not work as advertised. ;) -Raystonn ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On Tue, 9 Apr 2002, Raystonn wrote: > If you want to get back to the topic of software rendering, I would be more > than happy to oblige. Better yet, if you are serious - how about furthering your argument with patches to optimise and improve Mesa's software paths? ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> I agree. You may want to take a look at the following article: > > http://www.tech-report.com/reviews/2001q2/tnl/index.x?pg=1 > > It shows, among other things, a 400MHz PII with a 3dfx Voodoo2 (hardware > rasterization) getting almost double the framerate of a 1.4GHz Athlon > doing software rendering with Quake2 -- and the software rendering is > not even close to the quality of the hardware rendering due to all the > shortcuts being taken. A software implementation of an immediate mode renderer would indeed be extremely slow. The main CPU does not yet have access to the kinds of memory bandwidth that a 3D card does. I believe a software implementation of a scene-capture tile-renderer would have much better results. This is a more computationally expensive, less bandwidth-intensive algorithm which is more suited to a CPU's environment. -Raystonn ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > With the rest I disagree. The Kyro, for example, has some high-speed local > > memory (cache) it uses to hold the pixels for a tile. It can antialias and > > render translucent scenes without ever blitting the cache to the framebuffer > > more than once. > > It can't have infinite storage for tile information - so there would have to > be a hard limit on the number of translucent/edge/intersecting tiles - that > would not be OpenGL compliant. Each tile has a list of all polygons that might be drawn on its pixel cache. For a hardware implementation, memory could become an issue. If memory gets tight, the implementation could render all polygons currently in its tile lists and clear out tile memory. This would trade off memory space for a bit of overdraw. For a software implementation this would really not be a problem. > > This is the advantage to tile-based rendering. Since you > > only need to hold a tile's worth of pixels, you can use smaller high-speed > > cache. > > Only if the number of visible layers is small enough. What does the number of visible layers have to do with the ability to break down processing to a per-tile basis? I am not following here. > > As far as the reading of pixels from the framebuffer, this is a highly > > inefficient thing to do, no matter the hardware. If you want a fast > > application you will not attempt to read from the video card's memory. > > These operations are always extremely slow. > > They are only slow if the card doesn't implement them well - but there > are plenty of techniques (eg impostering) that rely on this kind of thing. This will always be slow. AGP transfers are inherently slow compared to everything else. DMA transfers are usually used for writing to the video card, usually passing it vertices and textures. These transfers can be done in the background. How many video cards implement DMA transfers for reads from the video card? Even if this was done, most of the time you are waiting for the results of the read in order to perform another operation. What good is pushing something into the background when you must wait for that operation to complete? These slow transfers and all the waiting makes this an extremely slow process. It is not recommended. -Raystonn ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > I agree with the "you have to read pixels back from the frame > > buffer and > > then continue rendering polygons." For a hardware > > implementation I might > > agree with the "you need to draw more polygons than your > > hardware has room > > to store", but only if the hardware implementation decides to perform > > overdraw rather than fetching the triangles on the fly from > > AGP memory. > > You need to agree that current hardware does implement the > sheme where some percentage of pixels is drawn multiple times. > Its a straigforward hardware design that nicely opens ways > for getting the performance with an affordable amount of > ASIC design engineering power. I dont assume the current > market leaders did choose that way if they did expect to > get more performance from the approaches. In the end i am > pretty sure that this approach does provide more ways for > interesing features and effects than the mentioned one pass > rendering would provide. First off, current market leaders began their hardware designs back when the main CPU was much much slower. They have an investment in this technology and likely do not want to throw it away. Back when these companies were founded, such 3d rendering could not be performed on the main processor at all. The computational power of the main processor has since increased dramatically. The algorithmic approach to 3d rendering should be reexamined with current and future hardware in mind. What was once true may no longer be so. Second, if a processor intensive algorithm was capable of better efficiency than a bandwidth intensive algorithm, there is a good chance these algorithms would be movd back over to the main CPU. If the main processor took over 3D rendering, what would the 3D card manufacturers sell? It would put them out of business essentially. Therefore you cannot gauge what is the most efficient algorithm based on what the 3D card manufacturers decide to push. They will push whatever is better for their bottom line and their own future. > Anyways, the current memory interfaces for the framebuffer memory > arent the performance break at all today. Its the features that > the applications do demand e.g. n-times texturing. The features of most games today do cause the current memory interfaces to be the performance bottleneck. This is why overclocking your card's memory offers more of a performance gain than overclocking your card's processor. > If these one-pass algorithms would be so resource saving, > why is there only a single hardware implementation and > the respective software solutions are of not much attention? Why should a 3D card hardware company show interest in something that could so easily be implemented in software? How does that benefit their bottom lines? > > With the rest I disagree. The Kyro, for example, has some > > high-speed local > > memory (cache) it uses to hold the pixels for a tile. It can > > antialias and > > render translucent scenes without ever blitting the cache to > > the framebuffer > > more than once. This is the advantage to tile-based > > rendering. Since you > > only need to hold a tile's worth of pixels, you can use > > smaller high-speed > > cache. > > Pixel caches and tiled framebuffers/textures are state of the art > for most (if not all) of current engines. Only looking at the Kyro > would draw a fals view of the market. Kryo has it too, so its sort > of a "me too product". But a vendors marketing department will never > tell you that it is this way. No, tile buffers cannot be used by immediate mode renderers to eliminate overdraw. Immediate mode does not render on a per-pixel basis. It renders on a per-polygon basis. Current hardware engines that use immediate mode rendering in fact do not make use of tile-based rendering. They would need a "tile buffer" the size of the entire framebuffer. At that point it is no longer a high speed buffer. It is simply the framebuffer. Imagine the cost of high-speed cache in quantities large enough to hold a full frame buffer, especially at high resolutions... While I would prefer to see a software implementation of scene-capture tile-based rendering, the Kyro was a good first step. It was the first mainstream card to use these algorithms. For this I applaud them. This was by no means a "me too product" as you claimed. > > As far as the reading of pixels from the framebuffer, this is a highly > > inefficient thing to do, no matter the hardware. If you want a fast > > application you will not attempt to read from the video card's memory. > > These operations are always extremely slow. > > For this there are caches (most often generic for nearly any render unit). > And reading is not that different from writing on current RAM designs. > Some reading is always working without any noticeable impact on performance, > (and its done for a good bunch of applications and features) > but if you need much data from framebuffer, than you migh
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
Stephen J Baker wrote: > Everything starts out in hardware and eventually moves to software. >>> >>>That's odd - I see the reverse happening. First we had software >> >>The move from hardware to software is an industry-wide pattern for all >>technology. It saves money. 3D video cards have been implementing new >>technologies that were never used in software before. Once the main >>processor is able to handle these things, they will be moved into software. >>This is just a fact of life in the computing industry. Take a look at what >>they did with "Winmodems". They removed hardware and wrote drivers to >>perform the tasks. The same thing will eventually happen in the 3D card >>industry. > > > That's not quite a fair comparison. I agree. You may want to take a look at the following article: http://www.tech-report.com/reviews/2001q2/tnl/index.x?pg=1 It shows, among other things, a 400MHz PII with a 3dfx Voodoo2 (hardware rasterization) getting almost double the framerate of a 1.4GHz Athlon doing software rendering with Quake2 -- and the software rendering is not even close to the quality of the hardware rendering due to all the shortcuts being taken. What we are seeing, throughout the industry, is a move to programmable graphics engines rather than fixed-function ones. Programmable vertex and fragment pipelines are not the same as a software implementation on a general purpose CPU, as the underlying hardware still has the special functionality needed for 3D graphics. I suspect that this will continue to be true for a very, very long time. -- Gareth ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> I agree with the "you have to read pixels back from the frame > buffer and > then continue rendering polygons." For a hardware > implementation I might > agree with the "you need to draw more polygons than your > hardware has room > to store", but only if the hardware implementation decides to perform > overdraw rather than fetching the triangles on the fly from > AGP memory. You need to agree that current hardware does implement the sheme where some percentage of pixels is drawn multiple times. Its a straigforward hardware design that nicely opens ways for getting the performance with an affordable amount of ASIC design engineering power. I dont assume the current market leaders did choose that way if they did expect to get more performance from the approaches. In the end i am pretty sure that this approach does provide more ways for interesing features and effects than the mentioned one pass rendering would provide. Anyways, the current memory interfaces for the framebuffer memory arent the performance break at all today. Its the features that the applications do demand e.g. n-times texturing. If these one-pass algorithms would be so resource saving, why is there only a single hardware implementation and the respective software solutions are of not much attention? The only reason i can see is, that it does not work as effective and performance increasing. To be honest you must substract the preprocessing time from the rendering gain. And you must expect the adapters not rendering at full speed because its running idle for some time due to CPU reasons. > With the rest I disagree. The Kyro, for example, has some > high-speed local > memory (cache) it uses to hold the pixels for a tile. It can > antialias and > render translucent scenes without ever blitting the cache to > the framebuffer > more than once. This is the advantage to tile-based > rendering. Since you > only need to hold a tile's worth of pixels, you can use > smaller high-speed > cache. Pixel caches and tiled framebuffers/textures are state of the art for most (if not all) of current engines. Only looking at the Kyro would draw a fals view of the market. Kryo has it too, so its sort of a "me too product". But a vendors marketing department will never tell you that it is this way. > As far as the reading of pixels from the framebuffer, this is a highly > inefficient thing to do, no matter the hardware. If you want a fast > application you will not attempt to read from the video card's memory. > These operations are always extremely slow. For this there are caches (most often generic for nearly any render unit). And reading is not that different from writing on current RAM designs. Some reading is always working without any noticeable impact on performance, (and its done for a good bunch of applications and features) but if you need much data from framebuffer, than you might notice it. That closer the pixel consuming circuit is to the RAM that better it will work. A CPU is one of the not so good consumers for pixels. > I still maintain that immediate mode renderering is an > inefficient algorithm designed to favor the use of memory > over computations. Hmm, current state of the art is called display list based rendering and its up to date and nicely optimizde despite the concept is an older one. It takes the goods of both worlds. Fast overdrawing rendering into memory and a higher level of primitive preprocessing. With only a single comparision on a preprocessed displaylist you can quickly decide if that display list is in need to be sent to the grafics adapter. Just belive that the performance is only at an optimum level if you are able to take the best of the two worlds - extreme overdraw rendering is neither good for performance, nor is intense geometrical preprocessing on a per frame base a viable way to performance. The hardware industry has found nice ways for combining both of these technologies to provide you the best of both worlds and thus the highes performance. And they are further developing in both of that areas and a few others more. Regards, Alex. ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > I still maintain that immediate mode renderering is an inefficient algorithm > > designed to favor the use of memory over computations. A better algorithm > > will always win out given enough time to overtake the optimized versions of > > the more inefficient algorithms. > > Perhaps you've forgotten what you originally said? The kyro is a graphics card. > > But still, hand-waving v real-world pragmatic performance figures matter > more, and here your Kyro and P4 lose. > > It really doesn't matter if algo (a) is better than (b). To progress > your argument you need to prove[1] that algo (a) is at least as good as, > and as cheap, in software on the P4 than either some other algo or the > same one in a graphics card. Whilst still allowing that processor to > perform other functions. > > [1] with numbers not with rhetoric. The first paragraph (the one you chose to quote) has nothing to do with implementing it in software. That was an entirely different discussion. This discussion is currently about the new topic of whether or not scene-capture tile-based rendering is more efficient than immediate mode rendering. I maintain that it is, and have included my arguments in my last post. If you want to get back to the topic of software rendering, I would be more than happy to oblige. But please don't quote arguments for a point in one debate and show them to be inadequate for proving a point in a prior debate. The top paragraph was not intended to support any argument regarding software rendering. -Raystonn ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On Mon, Apr 08, 2002 at 06:17:59PM -0700, Raystonn wrote: > I still maintain that immediate mode renderering is an inefficient algorithm > designed to favor the use of memory over computations. A better algorithm > will always win out given enough time to overtake the optimized versions of > the more inefficient algorithms. Perhaps you've forgotten what you originally said? The kyro is a graphics card. But still, hand-waving v real-world pragmatic performance figures matter more, and here your Kyro and P4 lose. It really doesn't matter if algo (a) is better than (b). To progress your argument you need to prove[1] that algo (a) is at least as good as, and as cheap, in software on the P4 than either some other algo or the same one in a graphics card. Whilst still allowing that processor to perform other functions. [1] with numbers not with rhetoric. -- Michael. ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On Mon, Apr 08, 2002 at 06:17:59PM -0700, Raystonn wrote: | As far as the reading of pixels from the framebuffer, this is a highly | inefficient thing to do, no matter the hardware. It doesn't have to be; that's just a tradeoff made by the hardware designers depending on the applications for which their systems are intended. Reading previously-rendered pixels is useful for things like dynamically-constructed environment maps, shadow maps, correction for projector optics, film compositing, and parallel renderers. There are various ways hardware can assist these operations, and various ways tiled renderers interact with them, but that discussion is too lengthy for this note. At any rate, the ability to use the results of previous renderings is a pretty important capability. | I still maintain that immediate mode renderering is an inefficient algorithm | designed to favor the use of memory over computations. An important design characteristic of immediate mode is that it allows the application to determine the rendering order. This helps achieve certain rendering effects (such as those Steve described earlier), but it can also be a *huge* efficiency win if the scene involves expensive mode changes, such as texture loads/unloads. Check out the original Reyes paper for a good quantitative discussion of this sort of issue. Allen ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > The games perform overdraw, sure. But I am talking about at the pixel > > level. A scene-capture algorithm performs 0 overdraw, regardless of what > > the game sends it. > > That's not true. I've designed and built machines like this and I know. > > You still need overdraw when: > > * you are antialiasing a polygon edge. > * you are rendering translucent surfaces. > * you need more textures on a polygon than your > hardware can render in a single pass. > * you have to read pixels back from the frame buffer and then > continue rendering polygons. > * polygons get smaller than a pixel in width or height. > * you need to draw more polygons than your hardware has > room to store. I agree with the "you have to read pixels back from the frame buffer and then continue rendering polygons." For a hardware implementation I might agree with the "you need to draw more polygons than your hardware has room to store", but only if the hardware implementation decides to perform overdraw rather than fetching the triangles on the fly from AGP memory. With the rest I disagree. The Kyro, for example, has some high-speed local memory (cache) it uses to hold the pixels for a tile. It can antialias and render translucent scenes without ever blitting the cache to the framebuffer more than once. This is the advantage to tile-based rendering. Since you only need to hold a tile's worth of pixels, you can use smaller high-speed cache. As far as the reading of pixels from the framebuffer, this is a highly inefficient thing to do, no matter the hardware. If you want a fast application you will not attempt to read from the video card's memory. These operations are always extremely slow. I still maintain that immediate mode renderering is an inefficient algorithm designed to favor the use of memory over computations. A better algorithm will always win out given enough time to overtake the optimized versions of the more inefficient algorithms. -Raystonn Sponsored by http://www.ThinkGeek.com/ ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On Thu, 4 Apr 2002, Raystonn wrote: > The games perform overdraw, sure. But I am talking about at the pixel > level. A scene-capture algorithm performs 0 overdraw, regardless of what > the game sends it. That's not true. I've designed and built machines like this and I know. You still need overdraw when: * you are antialiasing a polygon edge. * you are rendering translucent surfaces. * you need more textures on a polygon than your hardware can render in a single pass. * you have to read pixels back from the frame buffer and then continue rendering polygons. * polygons get smaller than a pixel in width or height. * you need to draw more polygons than your hardware has room to store. ...I'm sure there are other reasons too. > This reduces fillrate needs greatly. It reduces it (in my experience) by a factor of between 2 and 4 depending on the nature of the scene. You can easily invent scenes that show much more benefit - but they tend to be contrived cases that don't crop up much in real applications because of things like portal culling. > > Also, in order to use scene capture, you are reliant on the underlying > > graphics API to be supportive of this technique. Neither OpenGL nor > > Direct3D are terribly helpful. > > Kyro-based 'scene-capture' video cards support both Direct3D and OpenGL. They do - but they perform extremely poorly for OpenGL programs that do anything much more complicated than just throwing a pile of polygons at the display. As soon as you get into reading back pixels for any reason, any scene-capture system has to render the polygons it has before the program can access the pixels in the frame buffer. > > > Everything starts out in hardware and eventually moves to software. > > > > That's odd - I see the reverse happening. First we had software > > The move from hardware to software is an industry-wide pattern for all > technology. It saves money. 3D video cards have been implementing new > technologies that were never used in software before. Once the main > processor is able to handle these things, they will be moved into software. > This is just a fact of life in the computing industry. Take a look at what > they did with "Winmodems". They removed hardware and wrote drivers to > perform the tasks. The same thing will eventually happen in the 3D card > industry. That's not quite a fair comparison. Modems can be moved into software because there is no need for them *EVER* to get any faster. All modern modems can operate faster than any standard telephone line and are in essence *perfect* devices that cannot be improved upon in any way. Hence a hardware modem that would run MUCH faster than the CPU would be easy to build - but we don't because it's just not useful. That artificial limit on the speed of a modem is the only thing that allows software to catch up with hardware and make it obsolete. We might expect sound cards to go the same way - once they get fast enough to produce any concievable audio experience that the human perceptual system can comprehend - then there is a chance for software audio to catch up. That hasn't happened yet - which is something I find rather suprising. But that's in no way analogous to the graphics situation where we'll continue to need more performance until the graphics you can draw are completely photo-realistic - indistinguishable from the real world - and operate over the complete visual field at eye-limiting resolution. We are (in my estimation) still at least three orders of magnitude in performance away from that pixel fill rate and far from where we need to be in terms of realism and polygon rates. Steve Baker (817)619-2657 (Vox/Vox-Mail) L3Com/Link Simulation & Training (817)619-2466 (Fax) Work: [EMAIL PROTECTED] http://www.link.com Home: [EMAIL PROTECTED] http://www.sjbaker.org ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > Yes, some details were left out of CPU performance increases. The same was > > done for memory performance increases though. We have been discussing > > memory bandwidth as memory performance, completely leaving out memory > > latency, which has also improved tremendously. > > Pardon me, but I haven't seen this wonderful improvement. > > I benchmarked several machines a while back: > - a P200 with a TX chipset board and EDO DRAM > - a P2-266 with an LX chipset and PC-66 SDRAM > - a K6-III/550 with a Via MVP3 chipset and PC-100 SDRAM > > The P200 pulled off about 75MBytes/sec; the P2-266 pulled off about 55 MBytes/sec; the K6-III/550 pulled off about 100MBytes/sec. All of this was done under Linux; tests were performed with memtest86 (? it's been a while, basically though they were not performed under any operating system other than that which was on the floppy). > > This doesn't support your conclusions here. I would hazard a guess that memory performance there had more to do with the chipset involved than superior memory technology. You stopped your measurements with a processor that is around 5 years old. It is no wonder you got such low results. Now compare your EDO ram results, from about 5 years ago, to my current results: Pentium 4 2.4GHz, 400MHz FSB, i850 chipset with RDRAM: L1 cache: 19730MB/s, L2 cache: 16833MB/s, Memory: 1425MB/s My results are 19 times better than your results on the P200 with EDO RAM. This basically proves my case here. -Raystonn ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On Fri, 5 Apr 2002 17:11:26 -0800 "Raystonn" <[EMAIL PROTECTED]> wrote: > Yes, some details were left out of CPU performance increases. The same was > done for memory performance increases though. We have been discussing > memory bandwidth as memory performance, completely leaving out memory > latency, which has also improved tremendously. Pardon me, but I haven't seen this wonderful improvement. I benchmarked several machines a while back: - a P200 with a TX chipset board and EDO DRAM - a P2-266 with an LX chipset and PC-66 SDRAM - a K6-III/550 with a Via MVP3 chipset and PC-100 SDRAM The P200 pulled off about 75MBytes/sec; the P2-266 pulled off about 55 MBytes/sec; the K6-III/550 pulled off about 100MBytes/sec. All of this was done under Linux; tests were performed with memtest86 (? it's been a while, basically though they were not performed under any operating system other than that which was on the floppy). This doesn't support your conclusions here. I would hazard a guess that memory performance there had more to do with the chipset involved than superior memory technology. David Bronaugh ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > > OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed. > > > > No, in this 5 year period, processor clockspeed has moved from approximately > > 200MHz to over 2GHz. This is a factor of 10 in CPU growth and 16 in memory > > bandwidth. Memory bandwidth is growing more quickly than processor > > clockspeed now. > > Uhm...you've fallen into Intel's clockspeed trap. I'm assuming that you're > talking about a 200MHz Pentium vs. a 2GHz Pentium4. In the best case, the > Pentium could issue two instructions at once where as a Pentium4 or Athlon > can issue (and retire) many, many more. Not only that, the cycle times of > many instructions (such as multiply and divide) has decreased. I 2GHz > Pentium would still be crushed by a 2GHz Pentium4. Just like a 200Mhz > Pentium would crush a 200Mhz 286. :) Yes, some details were left out of CPU performance increases. The same was done for memory performance increases though. We have been discussing memory bandwidth as memory performance, completely leaving out memory latency, which has also improved tremendously. -Raystonn ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On Thu, Apr 04, 2002 at 09:30:39PM -0800, Raystonn wrote: > > OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed. > > No, in this 5 year period, processor clockspeed has moved from approximately > 200MHz to over 2GHz. This is a factor of 10 in CPU growth and 16 in memory > bandwidth. Memory bandwidth is growing more quickly than processor > clockspeed now. Uhm...you've fallen into Intel's clockspeed trap. I'm assuming that you're talking about a 200MHz Pentium vs. a 2GHz Pentium4. In the best case, the Pentium could issue two instructions at once where as a Pentium4 or Athlon can issue (and retire) many, many more. Not only that, the cycle times of many instructions (such as multiply and divide) has decreased. I 2GHz Pentium would still be crushed by a 2GHz Pentium4. Just like a 200Mhz Pentium would crush a 200Mhz 286. :) -- Tell that to the Marines! ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> Yes - and yet they still have horrible problems every time you have > a conditional branch instruction. That's because they are trying Not really. The Pentium 4 has a very efficient branch prediction unit. Most of the time it guesses the correct branch to take. When the actual branch is computed, it stores this information for later. Next time that branch is encountered it analyzes the stored information and bases its decision on that. Conditional branches are much less of a problem now. > > Most of > > the processing power of today's CPUs go completely unused. It is possible > > to create optimized implementations using Single-Instruction-Multiple-Data > > (SIMD) instructions of efficient algorithms. > > Which is a way of saying "Yes, you could do fast graphics on the CPU > if you put the GPU circuitry onto the CPU chip and pretend that it's > now part of the core CPU". What does this have to do with adding GPU hardware to the CPU? These SIMD instructions are already present on modern processors in the form of SSE and SSE2. > > We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM) > > to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years. We have > > over 16 times the memory bandwidth available today than we did just 5 years > > ago. Available memory bandwidth has been growing more quickly than > > processor clockspeed lately, and I do not foresee an end to this any time > > soon. > > OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed. No, in this 5 year period, processor clockspeed has moved from approximately 200MHz to over 2GHz. This is a factor of 10 in CPU growth and 16 in memory bandwidth. Memory bandwidth is growing more quickly than processor clockspeed now. > > Overutilised in my opinion. The amount of overdraw performed by today's > > video cards on modern games and applications is incredible. Immediate mode > > rendering is an inefficient algorithm. Video cards tend to have extremely > > well optimized implementations of this inefficient algorithm. > > That's because games *NEED* to do lots of overdraw. They are actually The games perform overdraw, sure. But I am talking about at the pixel level. A scene-capture algorithm performs 0 overdraw, regardless of what the game sends it. This reduces fillrate needs greatly. > > Kyro-based video cards perform quite well. They are not quite up to the > > level of nVidia's latest cards... > > Not *quite*!!! Their best card is significantly slower than > a GeForce 2MX - that's four generations of nVidia technology > ago. This has nothing to do with the algorithms itself. It merely has to do with the company's ability to scale its hardware. A software implementation would not be limited in this manner. It could take advantage of the processor manufacturer's ability to scale speeds much more easily. > Also, in order to use scene capture, you are reliant on the underlying > graphics API to be supportive of this technique. Neither OpenGL nor > Direct3D are terribly helpful. Kyro-based 'scene-capture' video cards support both Direct3D and OpenGL. Any game you can play using an nVidia card you can also play using a Kyro-based card. > > > Everything that is speeding up the main CPU is also speeding up > > > the graphics processor - faster silicon, faster busses and faster > > > RAM all help the graphics just as much as they help the CPU. > > > > Everything starts out in hardware and eventually moves to software. > > That's odd - I see the reverse happening. First we had software The move from hardware to software is an industry-wide pattern for all technology. It saves money. 3D video cards have been implementing new technologies that were never used in software before. Once the main processor is able to handle these things, they will be moved into software. This is just a fact of life in the computing industry. Take a look at what they did with "Winmodems". They removed hardware and wrote drivers to perform the tasks. The same thing will eventually happen in the 3D card industry. > As CPU's get faster, graphics cards get *MUCH* faster. This has mostly to do with memory bandwidth. The processors on the video cards are not all that impressive by themselves. Memory bandwidth available to the CPU is increasing rapidly. > CPU's aren't "catching up" - they are getting left behind. I disagree. What the CPU lacks in hardware units it makes up with sheer clockspeed. A video card may be able to perform 10 times as many operations per clock cycle as a CPU. But if that CPU is operating at over 10 times the clockspeed, who cares? It will eventually be faster. Video card manufactures cannot scale clockspeed anywhere near as well as Intel. > They are adding in steps to the graphics processing that are programmable. And this introduces the same problems that the main CPU is much better at dealing with. Branch prediction and other software issues have been highly
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On Tue, 2 Apr 2002, Raystonn wrote: > > That is far from the truth - they have internal pipelining > > and parallelism. Their use of silicon can be optimised to balance > > the performance of just one single algorithm. You can never do that > > for a machine that also has to run an OS, word process and run > > spreadsheets. > > Modern processors have internal pipelining and parallelism as well. Yes - and yet they still have horrible problems every time you have a conditional branch instruction. That's because they are trying to convert a highly linear operation (code execution) into some kind of a parallel form. Graphics is easier though. Each pixel and each polygon can be treated as a stand-alone entity and can be processed in true parallelism. > Most of > the processing power of today's CPUs go completely unused. It is possible > to create optimized implementations using Single-Instruction-Multiple-Data > (SIMD) instructions of efficient algorithms. Which is a way of saying "Yes, you could do fast graphics on the CPU if you put the GPU circuitry onto the CPU chip and pretend that it's now part of the core CPU". I'll grant you *that* - but it's not the same thing as doing the graphics in software. > > Since 1989, CPU speed has grown by a factor of 70. Over the same > > period the memory bus has increased by a factor of maybe 6 or so. > > We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM) > to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years. We have > over 16 times the memory bandwidth available today than we did just 5 years > ago. Available memory bandwidth has been growing more quickly than > processor clockspeed lately, and I do not foresee an end to this any time > soon. OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed. My argument remains - and remember that whenever RAM gets faster, so do the graphics cards. You can run faster - but you can't catch up if the other guy is also running faster. > > On the other hand, the graphics card can use heavily pipelined > > operations to guarantee that the memory bandwidth is 100% utilised > > Overutilised in my opinion. The amount of overdraw performed by today's > video cards on modern games and applications is incredible. Immediate mode > rendering is an inefficient algorithm. Video cards tend to have extremely > well optimized implementations of this inefficient algorithm. That's because games *NEED* to do lots of overdraw. They are actually pretty smart about eliminating the 'obvious' cases by doing things like portal culling. Most of the overdraw comes from needing to do multipass rendering (IIRC, the new Return To Castle Wolfenstien game uses up to 12 passes to render some polygons). The overdraw due to that kind of thing is rather harder to eliminate with algorithmic sophistication. If you need that kind of surface quality, your bandwidth out of memory will be high no matter what. > Kyro-based video cards perform quite well. They are not quite up to the > level of nVidia's latest cards... Not *quite*!!! Their best card is significantly slower than a GeForce 2MX - that's four generations of nVidia technology ago. I agree that if this algorithm were to be implemented on a card with the *other* capabilities of an nVidia card - then it would improve the fill rate by perhaps a factor of two or four. (Before you argue about that - realise that I've designed *and* built hardware and software using this technology - and I've MEASURED it's performance for 'typical' scenes). But you can only draw scenes where the number of polygons being rendered can fit into the 'scene capture' buffer. And that's the problem with that technology. If I want to draw a scene with a couple of million polygons in it (perfectly possible with modern cards) then those couple of million polygons have to be STORED ON THE GRAPHICS CARD. That's a big problem for an affordable graphics card. Adding another 128Mb of fast RAM to store the scene in costs a lot more than doubling the amount of processing power on the GPU. The amount of RAM on the chip becomes a major cost driver for a $120 card. None of those issues affect a software solution though - and it's possible that a scene capture solution *could* be better than a conventional immediate mode renderer - but I still think that it will at MOST only buy you a factor of 2x or 4x pixel rate speedup and you have a MUCH larger gap than that to hurdle. Also, in order to use scene capture, you are reliant on the underlying graphics API to be supportive of this technique. Neither OpenGL nor Direct3D are terribly helpful. You can write things like: Render 100 polygons. Read back the image they created. if the pixel at (123,456) is purple then { put that image into texture memory. Render another 100 polygons using the texture you just created. } ...scene capture algorithms have a very hard time with things like that becau
Re: [Dri-devel] Mesa software blending
> > Does 4 do pixel-based fog? > yep. So in some cases it is much slower than 3, isn't it? > It's because they are quite similar operations so they use the same chip > logic. In fact you have a bit to choose wether you want alpha or fog. It's > was design option. So they did not want to have two instances of the same logic on a single chip to implement this "rare" combination, wont they? > Don't know, but a transparent window in a foggy level is not a situation > very hard to happen... I see. > In this case the visual difference can be very big... Sad. Shame to ATI!:) Thanks for all these clarifications. They're really interesting matters for me. Sergey ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa software blending
On 2002.04.04 09:44 Sergey V. Udaltsov wrote: > > But the OpenGL spec says that the fog color is calculated on a _pixel_ > > basis and not on a _vertex_ basis. Indeed the result is different, > > especially in long polygons that span from the front way to the back. > Does 4 do pixel-based fog? > yep. > > Mach64 is able to do the fog properly, i.e., on a pixel basis, but > _not_ > Why? I know - it's only ATI who can answer this question...:) It's because they are quite similar operations so they use the same chip logic. In fact you have a bit to choose wether you want alpha or fog. It's was design option. > > when alpha blending since it uses the path on chip. So the problem is > only > > what to do when both fog and alpha blending are enabled. > Are there many apps using this effects together? > Don't know, but a transparent window in a foggy level is not a situation very hard to happen... > > The solution of using these depending of the contents of a env var is a > > compromise so that gamers achieve a better gameplay sacrifying a little > > > the visual quality and the OpenGL conformance. > Actually, end users in 80% (or 99%?) do not specially care about > conformance. The visual quality really matters. > > > There are other situations as this one. Leif checked on Unreal and > there > > is one (also when alpha blending) that happens and according with his > > experiments reverting to software leads to a severe performance hit. > It was predictable, wasn't it? And any predictions about _visual_ > difference between these two methods? Will users see the difference > easily? Say, if you get 10* speedup with 5% worse quality (I do not > really know how to measure it though:) - almost nobody will really use > SW mode. > In this case the visual difference can be very big... > ... > > Cheers, > > Sergey > José Fonseca ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa software blending
On 2002.04.04 09:08 Keith Whitwell wrote: > On Thu, 4 Apr 2002 01:56:22 +0100 > José Fonseca <[EMAIL PROTECTED]> wrote: > > > ... > > > the further away is the vertex more its color is nearer to the fog > > background color. Since the colors are interpolated in the triangle > this > > gave the impression of fog. > > > > But the OpenGL spec says that the fog color is calculated on a _pixel_ > > basis and not on a _vertex_ basis. Indeed the result is different, > > especially in long polygons that span from the front way to the back. > > > Actually I think it gives you scope to do either. Well the Opengl 1.3, sec 3.10 says: "Further, f need not be computed at each fragment, but may be computed at each vertex and interpolated as other data are.", but it also says "If enabled, fog blends a fog color with a rasterized fragment s post-texturing color using a blending factor f." Since it's applied after on the pipeline I'm not sure that the results will be the same, even without texturing.. > The issue with adding > fog into the vertex colors is what happens when texturing is turned on - > the vertex color may not even contribute to the color of the generated > fragments. I see, it depends of the texture environment... so unless there is a way around it we will have to fallback to software on these no matter what. > > Keith > José Fonseca ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa software blending
> But the OpenGL spec says that the fog color is calculated on a _pixel_ > basis and not on a _vertex_ basis. Indeed the result is different, > especially in long polygons that span from the front way to the back. Does 4 do pixel-based fog? > Mach64 is able to do the fog properly, i.e., on a pixel basis, but _not_ Why? I know - it's only ATI who can answer this question...:) > when alpha blending since it uses the path on chip. So the problem is only > what to do when both fog and alpha blending are enabled. Are there many apps using this effects together? > The solution of using these depending of the contents of a env var is a > compromise so that gamers achieve a better gameplay sacrifying a little > the visual quality and the OpenGL conformance. Actually, end users in 80% (or 99%?) do not specially care about conformance. The visual quality really matters. > There are other situations as this one. Leif checked on Unreal and there > is one (also when alpha blending) that happens and according with his > experiments reverting to software leads to a severe performance hit. It was predictable, wasn't it? And any predictions about _visual_ difference between these two methods? Will users see the difference easily? Say, if you get 10* speedup with 5% worse quality (I do not really know how to measure it though:) - almost nobody will really use SW mode. > > Cool! And the default version will be HW-based non-conformant, won't it? > This is very subjective, but if we assume that DRI aims to be OpenGL > conformant, I vote for sw-based conformant... :)) I see your point. Again "real life vs standards":). It's time for new poll on dri.sourceforge.net:) > > BTW, I can seriously recommend 3ddesktop as a test tool. It supports > > several effects (blending, textures, etc. with on/off switching) so its > > behavior could give a lot of hints to the developers. > I'll check it. Thanks a lot. I'm not lobbying it. I just like it - and would like it to work properly on Mach64 - today I have a lot of problems with it - which I don't have in pure SW mode. Cheers, Sergey ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa software blending
On Thu, 4 Apr 2002 01:56:22 +0100 José Fonseca <[EMAIL PROTECTED]> wrote: > On 2002.04.03 23:50 Sergey V. Udaltsov wrote: > > > He!He!.. you missed!! It's a mix of variant 1 and 2..! :) > > Cool. At least 1+2 is the answer (call it 5). Thanks. > > > > > As Leif previously said is his reply, it can be done in hardware by > > > messing the colors of the vertex to incorporate the fog (as software > > Mesa > > > used to do in 3.x) but it's non-conformant to the OpenGL spec. > > What's the difference in Mesa 4? Is it better? Can it be done in HW? > > Will this non-conformance cause visible problems on display? > > In Mesa 3.x when fog was enabled the vertex colors were changed so that > the further away is the vertex more its color is nearer to the fog > background color. Since the colors are interpolated in the triangle this > gave the impression of fog. > > But the OpenGL spec says that the fog color is calculated on a _pixel_ > basis and not on a _vertex_ basis. Indeed the result is different, > especially in long polygons that span from the front way to the back. Actually I think it gives you scope to do either. The issue with adding fog into the vertex colors is what happens when texturing is turned on - the vertex color may not even contribute to the color of the generated fragments. Keith ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa software blending
On 2002.04.03 23:50 Sergey V. Udaltsov wrote: > > He!He!.. you missed!! It's a mix of variant 1 and 2..! :) > Cool. At least 1+2 is the answer (call it 5). Thanks. > > > As Leif previously said is his reply, it can be done in hardware by > > messing the colors of the vertex to incorporate the fog (as software > Mesa > > used to do in 3.x) but it's non-conformant to the OpenGL spec. > What's the difference in Mesa 4? Is it better? Can it be done in HW? > Will this non-conformance cause visible problems on display? In Mesa 3.x when fog was enabled the vertex colors were changed so that the further away is the vertex more its color is nearer to the fog background color. Since the colors are interpolated in the triangle this gave the impression of fog. But the OpenGL spec says that the fog color is calculated on a _pixel_ basis and not on a _vertex_ basis. Indeed the result is different, especially in long polygons that span from the front way to the back. Mach64 is able to do the fog properly, i.e., on a pixel basis, but _not_ when alpha blending since it uses the path on chip. So the problem is only what to do when both fog and alpha blending are enabled. The solution of using these depending of the contents of a env var is a compromise so that gamers achieve a better gameplay sacrifying a little the visual quality and the OpenGL conformance. There are other situations as this one. Leif checked on Unreal and there is one (also when alpha blending) that happens and according with his experiments reverting to software leads to a severe performance hit. > > > So the solution found is to do it either like this or by software > > depending of the value of a environment var. > Cool! And the default version will be HW-based non-conformant, won't it? This is very subjective, but if we assume that DRI aims to be OpenGL conformant, I vote for sw-based conformant... > ... > > BTW, I can seriously recommend 3ddesktop as a test tool. It supports > several effects (blending, textures, etc. with on/off switching) so its > behavior could give a lot of hints to the developers. I'll check it. > > Cheers, > > Sergey > Regards, José Fonseca ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa software blending
> He!He!.. you missed!! It's a mix of variant 1 and 2..! :) Cool. At least 1+2 is the answer (call it 5). Thanks. > As Leif previously said is his reply, it can be done in hardware by > messing the colors of the vertex to incorporate the fog (as software Mesa > used to do in 3.x) but it's non-conformant to the OpenGL spec. What's the difference in Mesa 4? Is it better? Can it be done in HW? Will this non-conformance cause visible problems on display? > So the solution found is to do it either like this or by software > depending of the value of a environment var. Cool! And the default version will be HW-based non-conformant, won't it? > As you can guess implementing this is not a high priority. I see. > Well, CPU power does matter if the user is inclined to choose the > conformant way and do software fallback. If it's just one envvar - this solution will satisfy any user (except for one who want GeForce performance for the price of Mach64:). > The fact that the mach64 driver needs several software fallbacks to be > really OpenGL conformant is one of the reasons for my interest in > improving the Mesa sw rendering. (Which I must correct you, was the _real_ > start of the thread :) You're right. I missed the start. And quality SW rendering is still very important issue. BTW, I can seriously recommend 3ddesktop as a test tool. It supports several effects (blending, textures, etc. with on/off switching) so its behavior could give a lot of hints to the developers. Cheers, Sergey ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa software blending
On 2002.04.03 14:43 Sergey V. Udaltsov wrote: > After all these interesting and informative discussions, everyone has > forgotten the start of the thread:) Basically, there should one answer > to the question whether and how "blending+fog" can be implemented . > Possible variants: > 1. Yes, it can be done with hardware acceleration. DRI team knows how. > 2. No, it cannot be done. DRI team knows why. > 3. Possibly yes but actually no. ATI knows but DRI will never know (NDA > issues) > 4. Possibly yes but now DRI team does not know exactly how... He!He!.. you missed!! It's a mix of variant 1 and 2..! :) > Which answer is the correct one? After this question is answered, we > (users/testers) could get the idea whether we'll finally have HW > implementation or DRI will use indirect rendering here. As Leif previously said is his reply, it can be done in hardware by messing the colors of the vertex to incorporate the fog (as software Mesa used to do in 3.x) but it's non-conformant to the OpenGL spec. So the solution found is to do it either like this or by software depending of the value of a environment var. As you can guess implementing this is not a high priority. > Actually, here is the point where CPU power does not really matter. What > really matters is "possible"/"impossible" and "know"/"don't know" (sure, > add "want/don't want":) > Well, CPU power does matter if the user is inclined to choose the conformant way and do software fallback. The fact that the mach64 driver needs several software fallbacks to be really OpenGL conformant is one of the reasons for my interest in improving the Mesa sw rendering. (Which I must correct you, was the _real_ start of the thread :) > BTW, just tested mach64 driver with 3ddesktop. _Really_ fast but there > is a lot of artefacts and incorrect rendering. Probably, it's buggy app > (version 0.1.2:) but not necessarily. Could anyone please try and > comment an experience? > > Cheers, > > Sergey > Regards, José Fonseca ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
After all these interesting and informative discussions, everyone has forgotten the start of the thread:) Basically, there should one answer to the question whether and how "blending+fog" can be implemented . Possible variants: 1. Yes, it can be done with hardware acceleration. DRI team knows how. 2. No, it cannot be done. DRI team knows why. 3. Possibly yes but actually no. ATI knows but DRI will never know (NDA issues) 4. Possibly yes but now DRI team does not know exactly how... Which answer is the correct one? After this question is answered, we (users/testers) could get the idea whether we'll finally have HW implementation or DRI will use indirect rendering here. Actually, here is the point where CPU power does not really matter. What really matters is "possible"/"impossible" and "know"/"don't know" (sure, add "want/don't want":) BTW, just tested mach64 driver with 3ddesktop. _Really_ fast but there is a lot of artefacts and incorrect rendering. Probably, it's buggy app (version 0.1.2:) but not necessarily. Could anyone please try and comment an experience? Cheers, Sergey ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > The only thing video > > cards have today that is really better than the main processor is massive > > amounts of memory bandwidth. > > That is far from the truth - they have internal pipelining > and parallelism. Their use of silicon can be optimised to balance > the performance of just one single algorithm. You can never do that > for a machine that also has to run an OS, word process and run > spreadsheets. Modern processors have internal pipelining and parallelism as well. Most of the processing power of today's CPUs go completely unused. It is possible to create optimized implementations using Single-Instruction-Multiple-Data (SIMD) instructions of efficient algorithms. > > Since memory bandwidth is increasing rapidly,... > > It is?!? Let's look at the facts: > > Since 1989, CPU speed has grown by a factor of 70. Over the same > period the memory bus has increased by a factor of maybe 6 or so. We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM) to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years. We have over 16 times the memory bandwidth available today than we did just 5 years ago. Available memory bandwidth has been growing more quickly than processor clockspeed lately, and I do not foresee an end to this any time soon. > On the other hand, the graphics card can use heavily pipelined > operations to guarantee that the memory bandwidth is 100% utilised Overutilised in my opinion. The amount of overdraw performed by today's video cards on modern games and applications is incredible. Immediate mode rendering is an inefficient algorithm. Video cards tend to have extremely well optimized implementations of this inefficient algorithm. > - and can use an arbitarily large amount of parallelism to improve > throughput. The main CPU can't do that because it's memory access > patterns are not regular and it has little idea where the next byte > has to be read from until it's too late. Modern processors have a considerable amount of parallelism built in. With advanced prefetch and streaming SIMD instructions it is very possible to do these types of operations in a modern processor. It will, however, take another couple of years to be able to render at great framerates and high resolutions. > You only have to look at the gap you are trying to bridge - a > modern graphics card is *easily* 100 times faster at rendering > sophisticated pixels (with pixel shaders, multiple textures and > antialiasing) than the CPU. They are limited in what they can do. In order to allow more flexibility they have recently introduced pixel shaders, which basically turns the video card into a mini-CPU. Modern processors can perform these features more quickly and would allow an order of magnitude more flexibility in what can be done. > > A properly > > implemented and optimized software version of a tile-based "scene-capture" > > renderer much like that used in Kyro could perform as well as the latest > > video cards in a year or two. This is what I am dabbling with at the > > moment. > > I await this with interest - but 'scene capture' systems tend to be > unusable with modern graphics API's...they can't run either OpenGL > or Direct3D efficiently for arbitary input. If there were to be > some change in consumer needs that would result in 'scene capture' > being a usable technique - then the graphics cards can easily take > that on board and will *STILL* beat the heck out of doing it in > the CPU. Scene capture is also only feasible if the number of > polygons being rendered is small and bounded - the trends are > for modern graphics software to generate VAST numbers of polygons > on-the-fly precisely so they don't have to be stored in slow old > memory. Kyro-based video cards perform quite well. They are not quite up to the level of nVidia's latest cards but this is new technology and is being worked on by a relatively new company. These cards do not require nearly as much memory bandwidth as immediate-mode renderers, performing 0 overdraw. They are more processing intensive rather than being bandwidth intensive. I see this as a more efficient algorithm. > Everything that is speeding up the main CPU is also speeding up > the graphics processor - faster silicon, faster busses and faster > RAM all help the graphics just as much as they help the CPU. Everything starts out in hardware and eventually moves to software. There will come a time when the basic functionality provided by video cards can be easily done by a main processor. The extra features offered by the video cards, such as pixel shaders, are simply attempts to stand-in as a main processor. Once the basic functionality of the video card can be performed by the main system procsesor, there will really be no need for extra hardware to perform these tasks. What I see now is a move by the video card companies to software-based solutions (pixel shaders, etc.) They have recognized that there
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
I am definately all for increasing the performance of the software renderer. Eventually the main system processor will be fast enough to perform all of this without the need for a third party graphics card. The only thing video cards have today that is really better than the main processor is massive amounts of memory bandwidth. Since memory bandwidth is increasing rapidly, I foresee the need for video cards lessening in the future. A properly implemented and optimized software version of a tile-based "scene-capture" renderer much like that used in Kyro could perform as well as the latest video cards in a year or two. This is what I am dabbling with at the moment. -Raystonn - Original Message - From: "Brian Paul" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Monday, April 01, 2002 6:36 AM Subject: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending "José Fonseca" wrote: > > In these last few days I have been working on the Mesa software blending > and the existing MMX bug. I've made some progress. > > I made a small test program which calls the relevant functions directly as > Alex suggested. In the process I added comments to the assembly code > (which had none). The error is due to the fact that the inner loop blends > two pixels at the same time, so if the mask of the first element is zero > then both are skipped. I also spotted some errors in the runin section, > e.g., it ANDs with 4 and compares the result with 8 which is impossible... > I still have to study the x86 architecture optimization a little further > to know how to optimally fix both these situations. > > I also made two optimizations in blend_transparency(s_blend.c) which have > no effect in the result precision but that achieved a global speedup of > 30% in the function. These optimizations are in the C code and benefit all > architectures. > > The first was to avoid the repetition of the input variable in the DIV255. > At least my version of gcc (2.96) wasn't factoring the common code out > yelding to a 17% speedup. > > The second was to factor the equation of blending reducing in half the > number of multiplications. This optimization can be applied in other > places on this file as well. Good work. I'll review your changes and probably apply it to the Mesa trunk (for version 4.1) later today. > A third optimization that I'll try is the "double blend" trick (make two > 8-bit multiplications at the same time in a 32-bit register) as documented > by Michael Herf (http://www.stereopsis.com/doubleblend.html - a quite > interesting site referred to me by Brian). I was going to do that someday too. Go for it. > I would like to keep improving Mesa software rendering performance. I know > that due to its versatility and power Mesa will never rival with a > dedicated and non-conformant software 3d engine such as unreal one, > nevertheless I think that it's possible to make it usefull for simple > realtime rendering. Regards, Despite the proliferation of 3D hardware, there'll always be applications for software rendering. For example, the 16-bit color channel features is being used by several animation houses. -Brian ___ Mesa3d-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/mesa3d-dev ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> There is no sign whatever that CPU's are "catching up" with > graphics cards - and no logical reason why they ever will. It could however be argued that CPUs are "catching up" with the needs of a certain level of user. Not the hardcore gamer, but quite possibly the hobbyist 3D artist or 3D freeware game player. A different argument, but IMO are more important one. Either way, I don't see anyone arguing we should not be improving the software renderer. :) ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
Gack! I'm *so* sick of hearing this argument... On Tue, 2 Apr 2002, Raystonn wrote: > I am definately all for increasing the performance of the software renderer. Yes. > Eventually the main system processor will be fast enough to perform all of > this without the need for a third party graphics card. I very much doubt this will happen within the lifetime of silicon chip technology. Maybe with nanotech, biological or quantum computing - but probably not even then. > The only thing video > cards have today that is really better than the main processor is massive > amounts of memory bandwidth. That is far from the truth - they have internal pipelining and parallelism. Their use of silicon can be optimised to balance the performance of just one single algorithm. You can never do that for a machine that also has to run an OS, word process and run spreadsheets. > Since memory bandwidth is increasing rapidly,... It is?!? Let's look at the facts: Since 1989, CPU speed has grown by a factor of 70. Over the same period the memory bus has increased by a factor of maybe 6 or so. Caching can go some way to hiding that - but not for things like graphics that need massive frame buffers and huge texture maps. Caching also makes parallelism difficult and rendering algorithms are highly parallelisable. PC's are *horribly* memory-bound. > I foresee the need for video cards lessening in the future. Whilst memory bandwidth inside the main PC is increasing, it's doing so very slowly - and all the tricks it uses to get that speedup are equally applicable to the graphics hardware (things like DDR for example). On the other hand, the graphics card can use heavily pipelined operations to guarantee that the memory bandwidth is 100% utilised - and can use an arbitarily large amount of parallelism to improve throughput. The main CPU can't do that because it's memory access patterns are not regular and it has little idea where the next byte has to be read from until it's too late. Also, the instruction set of the main CPU isn't optimised for the rendering task - where that is the ONLY thing the graphics chip has to do. The main CPU has all this legacy crap to deal with because it's expected to run programs that were written 20 years ago. Every generation of graphics chip can have a totally redesigned internal architecture that exactly fits the profile of today's RAM and silicon speeds. You only have to look at the gap you are trying to bridge - a modern graphics card is *easily* 100 times faster at rendering sophisticated pixels (with pixel shaders, multiple textures and antialiasing) than the CPU. > A properly > implemented and optimized software version of a tile-based "scene-capture" > renderer much like that used in Kyro could perform as well as the latest > video cards in a year or two. This is what I am dabbling with at the > moment. I await this with interest - but 'scene capture' systems tend to be unusable with modern graphics API's...they can't run either OpenGL or Direct3D efficiently for arbitary input. If there were to be some change in consumer needs that would result in 'scene capture' being a usable technique - then the graphics cards can easily take that on board and will *STILL* beat the heck out of doing it in the CPU. Scene capture is also only feasible if the number of polygons being rendered is small and bounded - the trends are for modern graphics software to generate VAST numbers of polygons on-the-fly precisely so they don't have to be stored in slow old memory. Everything that is speeding up the main CPU is also speeding up the graphics processor - faster silicon, faster busses and faster RAM all help the graphics just as much as they help the CPU. However, increasing the number of transistors you can have on a chip doesn't help the CPU out very much. Their instruction sets are not getting more complex in proportion to the increase in silicon area - and their ability to make use of more complex instructions is already limited by the brain power of compiler writers. Most of the speedup in modern CPU's is coming from physically shorter distances for signals to travel and faster clocks - all of the extra gates typically end up increasing the size of the on-chip cache which has marginal benefits to graphics algorithms. In contrast to that, a graphics chip designer can just double the number of pixel processors or something and get an almost linear increase in performance with chip area with relatively little design effort and no software changes. If you doubt this, look at the progress over the last 5 or 6 years. In late 1996 the Voodoo-1 had a 50Mpixel/sec fill rate. In 2002 GeForce-4 has a fill rate of 4.8 Billion (antialiased) pixels/sec - it's 100 times faster. Over the same period, your 1996 233MHz CPU has gone up to a 2GHz machine ...a mere 10x speedup. The graphics cards are also gaining features. Over that same period, they added - windowing, hardware T&L, antiali
RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
Hello Raystonn, sorry, but a dedicated ASIC hardware is always faster. (you are a troll, arent you?) in the straight forward OpenGL case (flat and smooth shading) you can turn on several features in the pixel path and in the geometry pipeline (culling, 8x lighting, clipping) that you wont be able to perform at the same speed with a normal CPU setup. Its not only the bandwidth, its the floating point performance which the grafics chips are capable of by the meance of multiple parallel and dedicated FPU units. For the pixel path, when (multi) texturing is enabled or alpha blending or fogging or somtehing else that does readback (stencil buffer, depth buffer dependent operations, anit aliased lines) then you will spot that a classical CPU and processor system will not perform at its best if doing pixel manipulations of that sort. I think a regular grafics hardware can clean up your framebuffer in a fraction of the time, that a cpu-mainboard pairing can do. Thats the case since the good old IBM VGA from ages ago. And dont tell me an UMA architecture is better in that case. You first have to accept that the RAM DAC is time sharing the same bus system and therefore it permanently consumes bus cycles. But if rasterisation has separate memory with an option for a wider bus, separate chaces and higher clocked memory you will get better performance by design. Regards, Alex. -Original Message- From: Raystonn [mailto:[EMAIL PROTECTED]] Sent: Tuesday, April 02, 2002 19:45 To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending [Resending, fell into last night's black hole it seems.] I am definately all for increasing the performance of the software renderer. Eventually the main system processor will be fast enough to perform all of this without the need for a third party graphics card. The only thing video cards have today that is really better than the main processor is massive amounts of memory bandwidth. Since memory bandwidth is increasing rapidly, I foresee the need for video cards lessening in the future. A properly implemented and optimized software version of a tile-based "scene-capture" renderer much like that used in Kyro could perform as well as the latest video cards in a year or two. This is what I am dabbling with at the moment. -Raystonn ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> > > I don't think so. I haven't noticed a problem with fog > in the tunnel demo. > > So it works for you, doesn't it? Envious. > > For me, the fog effect does not work. Some time ago, someone (Jose?) > > even explained that is should not work on mach64 (alpha > blending + some > > other effect?) So my question was whether it should work now or not. > > No, this won't fix the problem. Mach64 can't do fog and > blending at the > same time, and the tunnel demo uses blending for the menu. > There was some > discussion of trying to use software fogging per-vertex when hardware > blending is enabled, but no one has implemented it yet. Jose was working on some MMX code that was currently disabled in the Mesa source due to bugs in the coding. So he fixed problems that could not come into effect for your case. With that fix you might spot some speedup with an MMX capable CPU if you are running specific mesa demos on it. Concerning the tunnel demo. As long as fogging is not required (at least i think it is not) for the rendering of the alpha blended help texts and the other informatinal texts, it would be the best if you just disable fogging for drawing these elements. Consider that mode turn-off as a fix for some sub optimal application coding. (I should have a look at that source and check if or why its not alredy done in that demo...) Regards, Alex. ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> Raystonn wrote: > > [Resending, fell into last night's black hole it seems.] > > I am definately all for increasing the performance of the software renderer. > Eventually the main system processor will be fast enough to perform all of > this without the need for a third party graphics card. The only thing video > cards have today that is really better than the main processor is massive > amounts of memory bandwidth. Since memory bandwidth is increasing rapidly, > I foresee the need for video cards lessening in the future. A properly > implemented and optimized software version of a tile-based "scene-capture" > renderer much like that used in Kyro could perform as well as the latest > video cards in a year or two. This is what I am dabbling with at the > moment. That's debatable. My personal opinion is that special-purpose graphics hardware will always perform better than a general-purpose CPU. The graphics pipeline is amenable to very specialized optimizations (both in computation and the memory system) that aren't applicable to a general purpose CPU. Of course, looking far enough into the future, all bets are off. -Brian ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
[Resending, fell into last night's black hole it seems.] I am definately all for increasing the performance of the software renderer.Eventually the main system processor will be fast enough to perform all ofthis without the need for a third party graphics card. The only thing videocards have today that is really better than the main processor is massiveamounts of memory bandwidth. Since memory bandwidth is increasing rapidly,I foresee the need for video cards lessening in the future. A properlyimplemented and optimized software version of a tile-based "scene-capture"renderer much like that used in Kyro could perform as well as the latestvideo cards in a year or two. This is what I am dabbling with at themoment.-Raystonn- Original Message -From: "Brian Paul" <[EMAIL PROTECTED]>To: <[EMAIL PROTECTED]>Cc: <[EMAIL PROTECTED]>Sent: Monday, April 01, 2002 6:36 AMSubject: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending"José Fonseca" wrote:>> In these last few days I have been working on the Mesa software blending> and the existing MMX bug. I've made some progress.>> I made a small test program which calls the relevant functions directly as> Alex suggested. In the process I added comments to the assembly code> (which had none). The error is due to the fact that the inner loop blends> two pixels at the same time, so if the mask of the first element is zero> then both are skipped. I also spotted some errors in the runin section,> e.g., it ANDs with 4 and compares the result with 8 which is impossible...> I still have to study the x86 architecture optimization a little further> to know how to optimally fix both these situations.>> I also made two optimizations in blend_transparency(s_blend.c) which have> no effect in the result precision but that achieved a global speedup of> 30% in the function. These optimizations are in the C code and benefit all> architectures.>> The first was to avoid the repetition of the input variable in the DIV255.> At least my version of gcc (2.96) wasn't factoring the common code out> yelding to a 17% speedup.>> The second was to factor the equation of blending reducing in half the> number of multiplications. This optimization can be applied in other> places on this file as well.Good work. I'll review your changes and probably apply it to the Mesa trunk(for version 4.1) later today.> A third optimization that I'll try is the "double blend" trick (make two> 8-bit multiplications at the same time in a 32-bit register) as documented> by Michael Herf (http://www.stereopsis.com/doubleblend.html - a quite> interesting site referred to me by Brian).I was going to do that someday too. Go for it.> I would like to keep improving Mesa software rendering performance. I know> that due to its versatility and power Mesa will never rival with a> dedicated and non-conformant software 3d engine such as unreal one,> nevertheless I think that it's possible to make it usefull for simple> realtime rendering. Regards,Despite the proliferation of 3D hardware, there'll always be applicationsfor software rendering. For example, the 16-bit color channel features isbeing used by several animation houses.-Brian___Mesa3d-dev mailing list[EMAIL PROTECTED]https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
On 2 Apr 2002, Sergey V. Udaltsov wrote: > > I don't think so. I haven't noticed a problem with fog in the tunnel demo. > So it works for you, doesn't it? Envious. > For me, the fog effect does not work. Some time ago, someone (Jose?) > even explained that is should not work on mach64 (alpha blending + some > other effect?) So my question was whether it should work now or not. No, this won't fix the problem. Mach64 can't do fog and blending at the same time, and the tunnel demo uses blending for the menu. There was some discussion of trying to use software fogging per-vertex when hardware blending is enabled, but no one has implemented it yet. -- Leif Delgass http://www.retinalburn.net ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
"Sergey V. Udaltsov" wrote: > > > I don't think so. I haven't noticed a problem with fog in the tunnel demo. > So it works for you, doesn't it? Envious. > For me, the fog effect does not work. Some time ago, someone (Jose?) > even explained that is should not work on mach64 (alpha blending + some > other effect?) So my question was whether it should work now or not. You didn't say anything about Mach64 in your original message - I assumed you were talking about software rendering/blending. I haven't tried the Mach64 branch yet. -Brian ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
> I don't think so. I haven't noticed a problem with fog in the tunnel demo. So it works for you, doesn't it? Envious. For me, the fog effect does not work. Some time ago, someone (Jose?) even explained that is should not work on mach64 (alpha blending + some other effect?) So my question was whether it should work now or not. Cheers, Sergey ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending
"Sergey V. Udaltsov" wrote: > > > In these last few days I have been working on the Mesa software blending > > and the existing MMX bug. I've made some progress. > Sorry for my ignorance, does this blending have anything to do with the > incorrect fog handling in the tunnel app? Will this patch fix it? I don't think so. I haven't noticed a problem with fog in the tunnel demo. -Brian ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa software blending
> In these last few days I have been working on the Mesa software blending > and the existing MMX bug. I've made some progress. Sorry for my ignorance, does this blending have anything to do with the incorrect fog handling in the tunnel app? Will this patch fix it? Sergey ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa software blending
"José Fonseca" wrote: > > In these last few days I have been working on the Mesa software blending > and the existing MMX bug. I've made some progress. > > I made a small test program which calls the relevant functions directly as > Alex suggested. In the process I added comments to the assembly code > (which had none). The error is due to the fact that the inner loop blends > two pixels at the same time, so if the mask of the first element is zero > then both are skipped. I also spotted some errors in the runin section, > e.g., it ANDs with 4 and compares the result with 8 which is impossible... > I still have to study the x86 architecture optimization a little further > to know how to optimally fix both these situations. > > I also made two optimizations in blend_transparency(s_blend.c) which have > no effect in the result precision but that achieved a global speedup of > 30% in the function. These optimizations are in the C code and benefit all > architectures. > > The first was to avoid the repetition of the input variable in the DIV255. > At least my version of gcc (2.96) wasn't factoring the common code out > yelding to a 17% speedup. > > The second was to factor the equation of blending reducing in half the > number of multiplications. This optimization can be applied in other > places on this file as well. Good work. I'll review your changes and probably apply it to the Mesa trunk (for version 4.1) later today. > A third optimization that I'll try is the "double blend" trick (make two > 8-bit multiplications at the same time in a 32-bit register) as documented > by Michael Herf (http://www.stereopsis.com/doubleblend.html - a quite > interesting site referred to me by Brian). I was going to do that someday too. Go for it. > I would like to keep improving Mesa software rendering performance. I know > that due to its versatility and power Mesa will never rival with a > dedicated and non-conformant software 3d engine such as unreal one, > nevertheless I think that it's possible to make it usefull for simple > realtime rendering. Regards, Despite the proliferation of 3D hardware, there'll always be applications for software rendering. For example, the 16-bit color channel features is being used by several animation houses. -Brian ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] Mesa software blending
In these last few days I have been working on the Mesa software blending and the existing MMX bug. I've made some progress. I made a small test program which calls the relevant functions directly as Alex suggested. In the process I added comments to the assembly code (which had none). The error is due to the fact that the inner loop blends two pixels at the same time, so if the mask of the first element is zero then both are skipped. I also spotted some errors in the runin section, e.g., it ANDs with 4 and compares the result with 8 which is impossible... I still have to study the x86 architecture optimization a little further to know how to optimally fix both these situations. I also made two optimizations in blend_transparency(s_blend.c) which have no effect in the result precision but that achieved a global speedup of 30% in the function. These optimizations are in the C code and benefit all architectures. The first was to avoid the repetition of the input variable in the DIV255. At least my version of gcc (2.96) wasn't factoring the common code out yelding to a 17% speedup. The second was to factor the equation of blending reducing in half the number of multiplications. This optimization can be applied in other places on this file as well. A third optimization that I'll try is the "double blend" trick (make two 8-bit multiplications at the same time in a 32-bit register) as documented by Michael Herf (http://www.stereopsis.com/doubleblend.html - a quite interesting site referred to me by Brian). I would like to keep improving Mesa software rendering performance. I know that due to its versatility and power Mesa will never rival with a dedicated and non-conformant software 3d engine such as unreal one, nevertheless I think that it's possible to make it usefull for simple realtime rendering. Regards, José Fonseca Index: swrast/s_blend.c === RCS file: /cvsroot/mesa3d/Mesa/src/swrast/s_blend.c,v retrieving revision 1.14 diff -u -r1.14 s_blend.c --- swrast/s_blend.c27 Mar 2002 15:49:27 - 1.14 +++ swrast/s_blend.c1 Apr 2002 00:34:20 - @@ -132,12 +132,24 @@ #if CHAN_BITS == 8 /* This satisfies Glean and should be reasonably fast */ /* Contributed by Nathan Hand */ +#if 0 #define DIV255(X) (((X) << 8) + (X) + 256) >> 16 +#else + const GLint temp; +#define DIV255(X) (temp = (X), ((temp << 8) + temp + 256) >> 16) +#endif +#if 0 const GLint s = CHAN_MAX - t; const GLint r = DIV255(rgba[i][RCOMP] * t + dest[i][RCOMP] * s); const GLint g = DIV255(rgba[i][GCOMP] * t + dest[i][GCOMP] * s); const GLint b = DIV255(rgba[i][BCOMP] * t + dest[i][BCOMP] * s); const GLint a = DIV255(rgba[i][ACOMP] * t + dest[i][ACOMP] * s); +#else +const GLint r = DIV255((rgba[i][RCOMP] - dest[i][RCOMP]) * t) + +dest[i][RCOMP]; +const GLint g = DIV255((rgba[i][GCOMP] - dest[i][GCOMP]) * t) + +dest[i][GCOMP]; +const GLint b = DIV255((rgba[i][BCOMP] - dest[i][BCOMP]) * t) + +dest[i][BCOMP]; +const GLint a = DIV255((rgba[i][ACOMP] - dest[i][ACOMP]) * t) + +dest[i][ACOMP]; +#endif #undef DIV255 #elif CHAN_BITS == 16 const GLfloat tt = (GLfloat) t / CHAN_MAXF; Index: X86/mmx_blend.S === RCS file: /cvsroot/mesa3d/Mesa/src/X86/mmx_blend.S,v retrieving revision 1.5 diff -u -r1.5 mmx_blend.S --- X86/mmx_blend.S 28 Mar 2001 20:44:44 - 1.5 +++ X86/mmx_blend.S 1 Apr 2002 00:35:13 - @@ -7,25 +7,35 @@ ALIGNTEXT16 GLOBL GLNAME(_mesa_mmx_blend_transparency) +/* + * void blend_transparency( GLcontext *ctx, + * GLuint n, + * const GLubyte mask[], + * GLchan rgba[][4], + * CONST GLchan dest[][4] ) + * + * Common transparency blending mode. + */ GLNAME( _mesa_mmx_blend_transparency ): PUSH_L( EBP ) MOV_L ( ESP, EBP ) SUB_L ( CONST(52), ESP ) PUSH_L( EBX ) + MOV_L ( CONST(16711680), REGOFF(-8, EBP) ) MOV_L ( CONST(16711680), REGOFF(-4, EBP) ) MOV_L ( CONST(0), REGOFF(-16, EBP) ) MOV_L ( CONST(-1), REGOFF(-12, EBP) ) MOV_L ( CONST(-1), REGOFF(-24, EBP) ) MOV_L ( CONST(0), REGOFF(-20, EBP) ) -MOV_L ( REGOFF(24, EBP), EAX ) +MOV_L ( REGOFF(24, EBP), EAX ) /* rgba */ ADD_L ( CONST(4), EAX ) MOV_L ( EAX, EDX ) -AND_L ( REGOFF(20, EBP), EDX ) +AND_L ( REGOFF(20, EBP), EDX ) /* mask */ MOV_L ( EDX, EAX ) AND_L ( CONST(4), EAX ) CMP_L ( CONST(8), EAX ) -JNE ( LLBL(GMBT_2) ) +JNE ( LLBL(GMBT_no_align) ) MOV_L ( REGOFF(20, EBP), EAX ) ADD_L ( CONST(3), EAX ) XOR_L ( ED