Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-11 Thread Nicholas Charles Leippe

> > You know what they found out with all of the hundreds of millions of
> dollars
> > they spent?  Dedicated hardware still does it faster and cheaper.  Period.
> > It's just like writing a custom routine to sort an array will pretty much
> > always be faster than using the generic qsort.  When you hand-tune for a
> > specific data set you will always get better performance.  This is not to
> > say that the generic implementation will not perform well or even
> acceptably
> > well, but only to say that it will never, ever, ever perform better.
> 
> Here you are comparing different algorithms.  A custom sort algorithm will
> perform much better than a standard qsort.  I agree.  Implementing something
> in hardware does not mean it uses a more efficient algorithm however.  A
> hardware implementation is just that, an implementation.  It does not change
> the underlying algorithms that are being used.  In fact, it tends to set the
> algorithm in stone.  This makes it very hard to adopt new better algorithms
> as they are invented.  In order to move to a better algorithm you must wait
> for a hardware manufacturer to implement it and then fork out more money.
> 
> Dedicated hardware can do a limited set of things faster.  There is no way
> to increase its capabilities without purchasing new hardware.  This is the
> weakness of having dedicated hardware for very specific functionality.  If a
> better algorithm is invented, it can take an extremely long time for it to
> be brought to market, if it is at all, and it will cost yet more money.
> Software has the advantage of being able to implement new algorithms much
> more quickly.  If a new algorithm is found to be that much better than the
> old, a software implementation of this algorithm will in fact outperform a
> hardware implementation of the older algorithm.  Algorithms are at least an
> order of magnitude more important than the implementation itself.
> 
> -Raystonn

Yes.  Choosing the correct (best) algorithm for a given problem will
reduce the calculation cost with the most significance.  Yes, once
a piece of silicon is etched, it is 'in stone' as to the featureset
it provides, and yes, if you want the latest and greatest featureset
in silicon you'll always have to fork out more money.  That's how
it's always been, and will always be.

However,  none of the commodity general purpose cpus are designed for
highly parallel execution of parallelizable algorithms--which just
about every graphics operation is.  How many pixels can a 2GHz Athlon
process at a time?  Usually just one.  How many can dedicated silicon?  
Mostly limited by how many can be fetched from memory at a time.
Thus, the algorithm is _not_ always an order of magnitude more
important than the implementation itself--especially if a parallelized
implementation can provide orders of magnitude more performance than
a serial implementation of the same or an even superior algorithm.

It remains a fact that in many cases where graphics algorithms are
concerned, even less efficient algorithms implemented in a highly
parallel fashion in specialized silicon (even _old_ silicon--voodoo2)
can still significantly outperform the snazziest new algorithm
implemented serially in software on even a screaming fast general
purpose cpu.  (see the links in the thread to the comparisons of
hardware w/a voodoo2 vs software w/an athlon 1+ GHz)


Nick


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-11 Thread Raystonn

> > Here you are comparing different algorithms.  A custom sort algorithm
will
> > perform much better than a standard qsort.  I agree.  Implementing
something
> > in hardware does not mean it uses a more efficient algorithm however.  A
> > hardware implementation is just that, an implementation.  It does not
change
> > the underlying algorithms that are being used.  In fact, it tends to set
the
> > algorithm in stone.  This makes it very hard to adopt new better
algorithms
> > as they are invented.  In order to move to a better algorithm you must
wait
> > for a hardware manufacturer to implement it and then fork out more
money.
>
> As far as I know, every new graphics chip out there right now is
programmable - it may have a limited number of operands but the microcode is
certainly modifiable. They aren't just straight ASICs.

The chips may (or may not, I have not double checked) be somewhat
programmable, but the arrangement of the chips in the pipeline are not.
Thus, the implementation of whatever algorithm they use can be tweaked
somewhat, but the algorithm is pretty much hard-coded.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-11 Thread David Bronaugh

On Thu, 11 Apr 2002 00:26:17 -0700
"Raystonn" <[EMAIL PROTECTED]> wrote:

> Here you are comparing different algorithms.  A custom sort algorithm will
> perform much better than a standard qsort.  I agree.  Implementing something
> in hardware does not mean it uses a more efficient algorithm however.  A
> hardware implementation is just that, an implementation.  It does not change
> the underlying algorithms that are being used.  In fact, it tends to set the
> algorithm in stone.  This makes it very hard to adopt new better algorithms
> as they are invented.  In order to move to a better algorithm you must wait
> for a hardware manufacturer to implement it and then fork out more money.

As far as I know, every new graphics chip out there right now is programmable - it may 
have a limited number of operands but the microcode is certainly modifiable. They 
aren't just straight ASICs.

David Bronaugh

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-11 Thread Raystonn

> You know what they found out with all of the hundreds of millions of
dollars
> they spent?  Dedicated hardware still does it faster and cheaper.  Period.
> It's just like writing a custom routine to sort an array will pretty much
> always be faster than using the generic qsort.  When you hand-tune for a
> specific data set you will always get better performance.  This is not to
> say that the generic implementation will not perform well or even
acceptably
> well, but only to say that it will never, ever, ever perform better.

Here you are comparing different algorithms.  A custom sort algorithm will
perform much better than a standard qsort.  I agree.  Implementing something
in hardware does not mean it uses a more efficient algorithm however.  A
hardware implementation is just that, an implementation.  It does not change
the underlying algorithms that are being used.  In fact, it tends to set the
algorithm in stone.  This makes it very hard to adopt new better algorithms
as they are invented.  In order to move to a better algorithm you must wait
for a hardware manufacturer to implement it and then fork out more money.

Dedicated hardware can do a limited set of things faster.  There is no way
to increase its capabilities without purchasing new hardware.  This is the
weakness of having dedicated hardware for very specific functionality.  If a
better algorithm is invented, it can take an extremely long time for it to
be brought to market, if it is at all, and it will cost yet more money.
Software has the advantage of being able to implement new algorithms much
more quickly.  If a new algorithm is found to be that much better than the
old, a software implementation of this algorithm will in fact outperform a
hardware implementation of the older algorithm.  Algorithms are at least an
order of magnitude more important than the implementation itself.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-10 Thread Ian Romanick

On Tue, Apr 09, 2002 at 06:29:54PM -0700, Raystonn wrote:

> First off, current market leaders began their hardware designs back when the
> main CPU was much much slower.  They have an investment in this technology
> and likely do not want to throw it away.  Back when these companies were
> founded, such 3d rendering could not be performed on the main processor at
> all.  The computational power of the main processor has since increased
> dramatically.  The algorithmic approach to 3d rendering should be reexamined
> with current and future hardware in mind.  What was once true may no longer
> be so.
> 
> Second, if a processor intensive algorithm was capable of better efficiency
> than a bandwidth intensive algorithm, there is a good chance these
> algorithms would be movd back over to the main CPU.  If the main processor
> took over 3D rendering, what would the 3D card manufacturers sell?  It would
> put them out of business essentially.  Therefore you cannot gauge what is
> the most efficient algorithm based on what the 3D card manufacturers decide
> to push.  They will push whatever is better for their bottom line and their
> own future.

I'm getting very tired of this thread.  If modern CPUs are s much better
for 3D, then why does Intel, of all companies, still make its own 3D
hardware in addition to CPUs?!?  If the main CPU was so wonderful for 3D
rendering, Intel would be all over it.  In fact, they tried to push that
agenda once when MMX first became available.  Remember?  Had it come out
before the original Voodoo Graphics, things might have been different for a
time.

You know what they found out with all of the hundreds of millions of dollars
they spent?  Dedicated hardware still does it faster and cheaper.  Period.
It's just like writing a custom routine to sort an array will pretty much
always be faster than using the generic qsort.  When you hand-tune for a
specific data set you will always get better performance.  This is not to
say that the generic implementation will not perform well or even acceptably
well, but only to say that it will never, ever, ever perform better.

-- 
Tell that to the Marines!

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Raystonn

> > If you want to get back to the topic of software rendering, I would be
more
> > than happy to oblige.
>
> Better yet, if you are serious - how about furthering your argument with
> patches to optimise and improve Mesa's software paths?

Patches will not do the job.  My ideas include a change in algorithm, not
implementation.  This would involve a huge redesign.  I have SGI's SI and
have been mucking about with it attempting to bring some kind of order to
it.  Once I complete that I will likely start over from scratch and
implement a new design based on scene-capturing and some algorithms I have
created.  Of course I will ensure it passes conformance tests in the end.
What good is an OpenGL implementation if it does not work as advertised. ;)

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Damien Miller

On Tue, 9 Apr 2002, Raystonn wrote:

> If you want to get back to the topic of software rendering, I would be more
> than happy to oblige.

Better yet, if you are serious - how about furthering your argument with
patches to optimise and improve Mesa's software paths?



___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Raystonn

> I agree.  You may want to take a look at the following article:
>
> http://www.tech-report.com/reviews/2001q2/tnl/index.x?pg=1
>
> It shows, among other things, a 400MHz PII with a 3dfx Voodoo2 (hardware
> rasterization) getting almost double the framerate of a 1.4GHz Athlon
> doing software rendering with Quake2 -- and the software rendering is
> not even close to the quality of the hardware rendering due to all the
> shortcuts being taken.

A software implementation of an immediate mode renderer would indeed be
extremely slow.  The main CPU does not yet have access to the kinds of
memory bandwidth that a 3D card does.  I believe a software implementation
of a scene-capture tile-renderer would have much better results.  This is a
more computationally expensive, less bandwidth-intensive algorithm which is
more suited to a CPU's environment.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Raystonn

> > With the rest I disagree.  The Kyro, for example, has some high-speed
local
> > memory (cache) it uses to hold the pixels for a tile.  It can antialias
and
> > render translucent scenes without ever blitting the cache to the
framebuffer
> > more than once.
>
> It can't have infinite storage for tile information - so there would have
to
> be a hard limit on the number of translucent/edge/intersecting tiles -
that
> would not be OpenGL compliant.

Each tile has a list of all polygons that might be drawn on its pixel cache.
For a hardware implementation, memory could become an issue.  If memory gets
tight, the implementation could render all polygons currently in its tile
lists and clear out tile memory.  This would trade off memory space for a
bit of overdraw.  For a software implementation this would really not be a
problem.


> > This is the advantage to tile-based rendering.  Since you
> > only need to hold a tile's worth of pixels, you can use smaller
high-speed
> > cache.
>
> Only if the number of visible layers is small enough.

What does the number of visible layers have to do with the ability to break
down processing to a per-tile basis?  I am not following here.


> > As far as the reading of pixels from the framebuffer, this is a highly
> > inefficient thing to do, no matter the hardware.  If you want a fast
> > application you will not attempt to read from the video card's memory.
> > These operations are always extremely slow.
>
> They are only slow if the card doesn't implement them well - but there
> are plenty of techniques (eg impostering) that rely on this kind of thing.

This will always be slow.  AGP transfers are inherently slow compared to
everything else.  DMA transfers are usually used for writing to the video
card, usually passing it vertices and textures.  These transfers can be done
in the background.  How many video cards implement DMA transfers for reads
from the video card?  Even if this was done, most of the time you are
waiting for the results of the read in order to perform another operation.
What good is pushing something into the background when you must wait for
that operation to complete?  These slow transfers and all the waiting makes
this an extremely slow process.  It is not recommended.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Raystonn

> > I agree with the "you have to read pixels back from the frame
> > buffer and
> > then continue rendering polygons."  For a hardware
> > implementation I might
> > agree with the "you need to draw more polygons than your
> > hardware has room
> > to store", but only if the hardware implementation decides to perform
> > overdraw rather than fetching the triangles on the fly from
> > AGP memory.
>
> You need to agree that current hardware does implement the
> sheme where some percentage of pixels is drawn multiple times.
> Its a straigforward hardware design that nicely opens ways
> for getting the performance with an affordable amount of
> ASIC design engineering power. I dont assume the current
> market leaders did choose that way if they did expect to
> get more performance from the approaches. In the end i am
> pretty sure that this approach does provide more ways for
> interesing features and effects than the mentioned one pass
> rendering would provide.

First off, current market leaders began their hardware designs back when the
main CPU was much much slower.  They have an investment in this technology
and likely do not want to throw it away.  Back when these companies were
founded, such 3d rendering could not be performed on the main processor at
all.  The computational power of the main processor has since increased
dramatically.  The algorithmic approach to 3d rendering should be reexamined
with current and future hardware in mind.  What was once true may no longer
be so.

Second, if a processor intensive algorithm was capable of better efficiency
than a bandwidth intensive algorithm, there is a good chance these
algorithms would be movd back over to the main CPU.  If the main processor
took over 3D rendering, what would the 3D card manufacturers sell?  It would
put them out of business essentially.  Therefore you cannot gauge what is
the most efficient algorithm based on what the 3D card manufacturers decide
to push.  They will push whatever is better for their bottom line and their
own future.


> Anyways, the current memory interfaces for the framebuffer memory
> arent the performance break at all today. Its the features that
> the applications do demand e.g. n-times texturing.

The features of most games today do cause the current memory interfaces to
be the performance bottleneck.  This is why overclocking your card's memory
offers more of a performance gain than overclocking your card's processor.


> If these one-pass algorithms would be so resource saving,
> why is there only a single hardware implementation and
> the respective software solutions are of not much attention?

Why should a 3D card hardware company show interest in something that could
so easily be implemented in software?  How does that benefit their bottom
lines?


> > With the rest I disagree.  The Kyro, for example, has some
> > high-speed local
> > memory (cache) it uses to hold the pixels for a tile.  It can
> > antialias and
> > render translucent scenes without ever blitting the cache to
> > the framebuffer
> > more than once.  This is the advantage to tile-based
> > rendering.  Since you
> > only need to hold a tile's worth of pixels, you can use
> > smaller high-speed
> > cache.
>
> Pixel caches and tiled framebuffers/textures are state of the art
> for most (if not all) of current engines. Only looking at the Kyro
> would draw a fals view of the market. Kryo has it too, so its sort
> of a "me too product". But a vendors marketing department will never
> tell you that it is this way.

No, tile buffers cannot be used by immediate mode renderers to eliminate
overdraw.  Immediate mode does not render on a per-pixel basis.  It renders
on a per-polygon basis.  Current hardware engines that use immediate mode
rendering in fact do not make use of tile-based rendering.  They would need
a "tile buffer" the size of the entire framebuffer.  At that point it is no
longer a high speed buffer.  It is simply the framebuffer.  Imagine the cost
of high-speed cache in quantities large enough to hold a full frame buffer,
especially at high resolutions...

While I would prefer to see a software implementation of scene-capture
tile-based rendering, the Kyro was a good first step.  It was the first
mainstream card to use these algorithms.  For this I applaud them.  This was
by no means a "me too product" as you claimed.


> > As far as the reading of pixels from the framebuffer, this is a highly
> > inefficient thing to do, no matter the hardware.  If you want a fast
> > application you will not attempt to read from the video card's memory.
> > These operations are always extremely slow.
>
> For this there are caches (most often generic for nearly any render unit).
> And reading is not that different from writing on current RAM designs.
> Some reading is always working without any noticeable impact on
performance,
> (and its done for a good bunch of applications and features)
> but if you need much data from framebuffer, than you migh

Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Gareth Hughes

Stephen J Baker wrote:
 >
Everything starts out in hardware and eventually moves to software.
>>>
>>>That's odd - I see the reverse happening.  First we had software
>>
>>The move from hardware to software is an industry-wide pattern for all
>>technology.  It saves money.  3D video cards have been implementing new
>>technologies that were never used in software before.  Once the main
>>processor is able to handle these things, they will be moved into software.
>>This is just a fact of life in the computing industry.  Take a look at what
>>they did with "Winmodems".  They removed hardware and wrote drivers to
>>perform the tasks.  The same thing will eventually happen in the 3D card
>>industry.
> 
> 
> That's not quite a fair comparison.

I agree.  You may want to take a look at the following article:

http://www.tech-report.com/reviews/2001q2/tnl/index.x?pg=1

It shows, among other things, a 400MHz PII with a 3dfx Voodoo2 (hardware 
rasterization) getting almost double the framerate of a 1.4GHz Athlon 
doing software rendering with Quake2 -- and the software rendering is 
not even close to the quality of the hardware rendering due to all the 
shortcuts being taken.

What we are seeing, throughout the industry, is a move to programmable 
graphics engines rather than fixed-function ones.  Programmable vertex 
and fragment pipelines are not the same as a software implementation on 
a general purpose CPU, as the underlying hardware still has the special 
functionality needed for 3D graphics.  I suspect that this will continue 
to be true for a very, very long time.

-- Gareth


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Alexander Stohr

> I agree with the "you have to read pixels back from the frame 
> buffer and
> then continue rendering polygons."  For a hardware 
> implementation I might
> agree with the "you need to draw more polygons than your 
> hardware has room
> to store", but only if the hardware implementation decides to perform
> overdraw rather than fetching the triangles on the fly from 
> AGP memory.

You need to agree that current hardware does implement the 
sheme where some percentage of pixels is drawn multiple times.
Its a straigforward hardware design that nicely opens ways 
for getting the performance with an affordable amount of
ASIC design engineering power. I dont assume the current
market leaders did choose that way if they did expect to
get more performance from the approaches. In the end i am
pretty sure that this approach does provide more ways for
interesing features and effects than the mentioned one pass 
rendering would provide.

Anyways, the current memory interfaces for the framebuffer memory
arent the performance break at all today. Its the features that
the applications do demand e.g. n-times texturing.

If these one-pass algorithms would be so resource saving, 
why is there only a single hardware implementation and 
the respective software solutions are of not much attention?
The only reason i can see is, that it does not work as 
effective and performance increasing. To be honest you must
substract the preprocessing time from the rendering gain.
And you must expect the adapters not rendering at full speed 
because its running idle for some time due to CPU reasons.

> With the rest I disagree.  The Kyro, for example, has some 
> high-speed local
> memory (cache) it uses to hold the pixels for a tile.  It can 
> antialias and
> render translucent scenes without ever blitting the cache to 
> the framebuffer
> more than once.  This is the advantage to tile-based 
> rendering.  Since you
> only need to hold a tile's worth of pixels, you can use 
> smaller high-speed
> cache.

Pixel caches and tiled framebuffers/textures are state of the art
for most (if not all) of current engines. Only looking at the Kyro
would draw a fals view of the market. Kryo has it too, so its sort
of a "me too product". But a vendors marketing department will never 
tell you that it is this way.

> As far as the reading of pixels from the framebuffer, this is a highly
> inefficient thing to do, no matter the hardware.  If you want a fast
> application you will not attempt to read from the video card's memory.
> These operations are always extremely slow.

For this there are caches (most often generic for nearly any render unit).
And reading is not that different from writing on current RAM designs. 
Some reading is always working without any noticeable impact on performance,
(and its done for a good bunch of applications and features)
but if you need much data from framebuffer, than you might notice it.
That closer the pixel consuming circuit is to the RAM that better it
will work. A CPU is one of the not so good consumers for pixels.

> I still maintain that immediate mode renderering is an 
> inefficient algorithm designed to favor the use of memory 
> over computations.

Hmm, current state of the art is called display list based rendering
and its up to date and nicely optimizde despite the concept is
an older one. It takes the goods of both worlds. Fast overdrawing
rendering into memory and a higher level of primitive preprocessing.
With only a single comparision on a preprocessed displaylist you
can quickly decide if that display list is in need to be sent to the
grafics adapter. 

Just belive that the performance is only at an optimum level if
you are able to take the best of the two worlds - extreme overdraw
rendering is neither good for performance, nor is intense geometrical 
preprocessing on a per frame base a viable way to performance.
The hardware industry has found nice ways for combining both of
these technologies to provide you the best of both worlds and thus
the highes performance. And they are further developing in both of
that areas and a few others more.

Regards, Alex.


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-08 Thread Raystonn

> > I still maintain that immediate mode renderering is an inefficient
algorithm
> > designed to favor the use of memory over computations.  A better
algorithm
> > will always win out given enough time to overtake the optimized versions
of
> > the more inefficient algorithms.
>
> Perhaps you've forgotten what you originally said? The kyro is a graphics
card.
>
> But still, hand-waving v real-world pragmatic performance figures matter
> more, and here your Kyro and P4 lose.
>
> It really doesn't matter if algo (a) is better than (b). To progress
> your argument you need to prove[1] that algo (a) is at least as good as,
> and as cheap, in software on the P4 than either some other algo or the
> same one in a graphics card. Whilst still allowing that processor to
> perform other functions.
>
> [1] with numbers not with rhetoric.

The first paragraph (the one you chose to quote) has nothing to do with
implementing it in software.  That was an entirely different discussion.
This discussion is currently about the new topic of whether or not
scene-capture tile-based rendering is more efficient than immediate mode
rendering.  I maintain that it is, and have included my arguments in my last
post.

If you want to get back to the topic of software rendering, I would be more
than happy to oblige.  But please don't quote arguments for a point in one
debate and show them to be inadequate for proving a point in a prior debate.
The top paragraph was not intended to support any argument regarding
software rendering.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-08 Thread Michael

On Mon, Apr 08, 2002 at 06:17:59PM -0700, Raystonn wrote:

> I still maintain that immediate mode renderering is an inefficient algorithm
> designed to favor the use of memory over computations.  A better algorithm
> will always win out given enough time to overtake the optimized versions of
> the more inefficient algorithms.

Perhaps you've forgotten what you originally said? The kyro is a graphics card.

But still, hand-waving v real-world pragmatic performance figures matter
more, and here your Kyro and P4 lose.

It really doesn't matter if algo (a) is better than (b). To progress
your argument you need to prove[1] that algo (a) is at least as good as,
and as cheap, in software on the P4 than either some other algo or the
same one in a graphics card. Whilst still allowing that processor to
perform other functions.

[1] with numbers not with rhetoric.

-- 
Michael.

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-08 Thread Allen Akin

On Mon, Apr 08, 2002 at 06:17:59PM -0700, Raystonn wrote:
| As far as the reading of pixels from the framebuffer, this is a highly
| inefficient thing to do, no matter the hardware.

It doesn't have to be; that's just a tradeoff made by the hardware
designers depending on the applications for which their systems are
intended.

Reading previously-rendered pixels is useful for things like
dynamically-constructed environment maps, shadow maps, correction for
projector optics, film compositing, and parallel renderers.  There are
various ways hardware can assist these operations, and various ways
tiled renderers interact with them, but that discussion is too lengthy
for this note.  At any rate, the ability to use the results of previous
renderings is a pretty important capability.

| I still maintain that immediate mode renderering is an inefficient algorithm
| designed to favor the use of memory over computations.

An important design characteristic of immediate mode is that it allows
the application to determine the rendering order.  This helps achieve
certain rendering effects (such as those Steve described earlier), but
it can also be a *huge* efficiency win if the scene involves expensive
mode changes, such as texture loads/unloads.  Check out the original
Reyes paper for a good quantitative discussion of this sort of issue.

Allen

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-08 Thread Raystonn

> > The games perform overdraw, sure.  But I am talking about at the pixel
> > level.  A scene-capture algorithm performs 0 overdraw, regardless of
what
> > the game sends it.
>
> That's not true.  I've designed and built machines like this and I know.
>
> You still need overdraw when:
>
>   * you are antialiasing a polygon edge.
>   * you are rendering translucent surfaces.
>   * you need more textures on a polygon than your
> hardware can render in a single pass.
>   * you have to read pixels back from the frame buffer and then
> continue rendering polygons.
>   * polygons get smaller than a pixel in width or height.
>   * you need to draw more polygons than your hardware has
> room to store.

I agree with the "you have to read pixels back from the frame buffer and
then continue rendering polygons."  For a hardware implementation I might
agree with the "you need to draw more polygons than your hardware has room
to store", but only if the hardware implementation decides to perform
overdraw rather than fetching the triangles on the fly from AGP memory.

With the rest I disagree.  The Kyro, for example, has some high-speed local
memory (cache) it uses to hold the pixels for a tile.  It can antialias and
render translucent scenes without ever blitting the cache to the framebuffer
more than once.  This is the advantage to tile-based rendering.  Since you
only need to hold a tile's worth of pixels, you can use smaller high-speed
cache.

As far as the reading of pixels from the framebuffer, this is a highly
inefficient thing to do, no matter the hardware.  If you want a fast
application you will not attempt to read from the video card's memory.
These operations are always extremely slow.

I still maintain that immediate mode renderering is an inefficient algorithm
designed to favor the use of memory over computations.  A better algorithm
will always win out given enough time to overtake the optimized versions of
the more inefficient algorithms.

-Raystonn


Sponsored by http://www.ThinkGeek.com/

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-08 Thread Stephen J Baker

On Thu, 4 Apr 2002, Raystonn wrote:

> The games perform overdraw, sure.  But I am talking about at the pixel
> level.  A scene-capture algorithm performs 0 overdraw, regardless of what
> the game sends it.

That's not true.  I've designed and built machines like this and I know.

You still need overdraw when:

  * you are antialiasing a polygon edge.
  * you are rendering translucent surfaces.
  * you need more textures on a polygon than your
hardware can render in a single pass.
  * you have to read pixels back from the frame buffer and then
continue rendering polygons.
  * polygons get smaller than a pixel in width or height.
  * you need to draw more polygons than your hardware has
room to store.

...I'm sure there are other reasons too.

>  This reduces fillrate needs greatly.

It reduces it (in my experience) by a factor of between 2 and 4 depending
on the nature of the scene.  You can easily invent scenes that show much
more benefit - but they tend to be contrived cases that don't crop up
much in real applications because of things like portal culling.

> > Also, in order to use scene capture, you are reliant on the underlying
> > graphics API to be supportive of this technique.  Neither OpenGL nor
> > Direct3D are terribly helpful.
>
> Kyro-based 'scene-capture' video cards support both Direct3D and OpenGL.

They do - but they perform extremely poorly for OpenGL programs that
do anything much more complicated than just throwing a pile of polygons
at the display.  As soon as you get into reading back pixels for any
reason, any scene-capture system has to render the polygons it has
before the program can access the pixels in the frame buffer.

> > > Everything starts out in hardware and eventually moves to software.
> >
> > That's odd - I see the reverse happening.  First we had software
>
> The move from hardware to software is an industry-wide pattern for all
> technology.  It saves money.  3D video cards have been implementing new
> technologies that were never used in software before.  Once the main
> processor is able to handle these things, they will be moved into software.
> This is just a fact of life in the computing industry.  Take a look at what
> they did with "Winmodems".  They removed hardware and wrote drivers to
> perform the tasks.  The same thing will eventually happen in the 3D card
> industry.

That's not quite a fair comparison.

Modems can be moved into software because there is no need for them *EVER*
to get any faster.  All modern modems can operate faster than any standard
telephone line and are in essence *perfect* devices that cannot be improved
upon in any way.  Hence a hardware modem that would run MUCH faster than the
CPU would be easy to build - but we don't because it's just not useful.
That artificial limit on the speed of a modem is the only thing that allows
software to catch up with hardware and make it obsolete.

We might expect sound cards to go the same way - once they get fast enough
to produce any concievable audio experience that the human perceptual
system can comprehend - then there is a chance for software audio to catch
up.  That hasn't happened yet - which is something I find rather suprising.

But that's in no way analogous to the graphics situation where we'll continue
to need more performance until the graphics you can draw are completely
photo-realistic - indistinguishable from the real world - and operate over
the complete visual field at eye-limiting resolution.  We are (in my
estimation) still at least three orders of magnitude in performance
away from that pixel fill rate and far from where we need to be in
terms of realism and polygon rates.


Steve Baker  (817)619-2657 (Vox/Vox-Mail)
L3Com/Link Simulation & Training (817)619-2466 (Fax)
Work: [EMAIL PROTECTED]   http://www.link.com
Home: [EMAIL PROTECTED]   http://www.sjbaker.org


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-05 Thread Raystonn

> > Yes, some details were left out of CPU performance increases.  The same
was
> > done for memory performance increases though.  We have been discussing
> > memory bandwidth as memory performance, completely leaving out memory
> > latency, which has also improved tremendously.
>
> Pardon me, but I haven't seen this wonderful improvement.
>
> I benchmarked several machines a while back:
> - a P200 with a TX chipset board and EDO DRAM
> - a P2-266 with an LX chipset and PC-66 SDRAM
> - a K6-III/550 with a Via MVP3 chipset and PC-100 SDRAM
>
> The P200 pulled off about 75MBytes/sec; the P2-266 pulled off about 55
MBytes/sec; the K6-III/550 pulled off about 100MBytes/sec. All of this was
done under Linux; tests were performed with memtest86 (? it's been a while,
basically though they were not performed under any operating system other
than that which was on the floppy).
>
> This doesn't support your conclusions here. I would hazard a guess that
memory performance there had more to do with the chipset involved than
superior memory technology.



You stopped your measurements with a processor that is around 5 years old.
It is no wonder you got such low results.  Now compare your EDO ram results,
from about 5 years ago, to my current results:

Pentium 4 2.4GHz, 400MHz FSB, i850 chipset with RDRAM:   L1 cache:
19730MB/s, L2 cache: 16833MB/s, Memory: 1425MB/s

My results are 19 times better than your results on the P200 with EDO RAM.
This basically proves my case here.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-05 Thread David Bronaugh

On Fri, 5 Apr 2002 17:11:26 -0800
"Raystonn" <[EMAIL PROTECTED]> wrote:
> Yes, some details were left out of CPU performance increases.  The same was
> done for memory performance increases though.  We have been discussing
> memory bandwidth as memory performance, completely leaving out memory
> latency, which has also improved tremendously.

Pardon me, but I haven't seen this wonderful improvement.

I benchmarked several machines a while back:
- a P200 with a TX chipset board and EDO DRAM
- a P2-266 with an LX chipset and PC-66 SDRAM
- a K6-III/550 with a Via MVP3 chipset and PC-100 SDRAM

The P200 pulled off about 75MBytes/sec; the P2-266 pulled off about 55 MBytes/sec; the 
K6-III/550 pulled off about 100MBytes/sec. All of this was done under Linux; tests 
were performed with memtest86 (? it's been a while, basically though they were not 
performed under any operating system other than that which was on the floppy).

This doesn't support your conclusions here. I would hazard a guess that memory 
performance there had more to do with the chipset involved than superior memory 
technology.

David Bronaugh

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-05 Thread Raystonn

> > > OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed.
> >
> > No, in this 5 year period, processor clockspeed has moved from
approximately
> > 200MHz to over 2GHz.  This is a factor of 10 in CPU growth and 16 in
memory
> > bandwidth.  Memory bandwidth is growing more quickly than processor
> > clockspeed now.
>
> Uhm...you've fallen into Intel's clockspeed trap.  I'm assuming that
you're
> talking about a 200MHz Pentium vs. a 2GHz Pentium4.  In the best case, the
> Pentium could issue two instructions at once where as a Pentium4 or Athlon
> can issue (and retire) many, many more.  Not only that, the cycle times of
> many instructions (such as multiply and divide) has decreased.  I 2GHz
> Pentium would still be crushed by a 2GHz Pentium4.  Just like a 200Mhz
> Pentium would crush a 200Mhz 286. :)

Yes, some details were left out of CPU performance increases.  The same was
done for memory performance increases though.  We have been discussing
memory bandwidth as memory performance, completely leaving out memory
latency, which has also improved tremendously.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-05 Thread Ian Romanick

On Thu, Apr 04, 2002 at 09:30:39PM -0800, Raystonn wrote:

> > OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed.
> 
> No, in this 5 year period, processor clockspeed has moved from approximately
> 200MHz to over 2GHz.  This is a factor of 10 in CPU growth and 16 in memory
> bandwidth.  Memory bandwidth is growing more quickly than processor
> clockspeed now.

Uhm...you've fallen into Intel's clockspeed trap.  I'm assuming that you're
talking about a 200MHz Pentium vs. a 2GHz Pentium4.  In the best case, the
Pentium could issue two instructions at once where as a Pentium4 or Athlon
can issue (and retire) many, many more.  Not only that, the cycle times of
many instructions (such as multiply and divide) has decreased.  I 2GHz
Pentium would still be crushed by a 2GHz Pentium4.  Just like a 200Mhz
Pentium would crush a 200Mhz 286. :)

-- 
Tell that to the Marines!

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-04 Thread Raystonn

> Yes - and yet they still have horrible problems every time you have
> a conditional branch instruction.  That's because they are trying

Not really.  The Pentium 4 has a very efficient branch prediction unit.
Most of the time it guesses the correct branch to take.  When the actual
branch is computed, it stores this information for later.  Next time that
branch is encountered it analyzes the stored information and bases its
decision on that.  Conditional branches are much less of a problem now.


> >  Most of
> > the processing power of today's CPUs go completely unused.  It is
possible
> > to create optimized implementations using
Single-Instruction-Multiple-Data
> > (SIMD) instructions of efficient algorithms.
>
> Which is a way of saying "Yes, you could do fast graphics on the CPU
> if you put the GPU circuitry onto the CPU chip and pretend that it's
> now part of the core CPU".

What does this have to do with adding GPU hardware to the CPU?  These SIMD
instructions are already present on modern processors in the form of SSE and
SSE2.


> > We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO
RAM)
> > to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years.  We
have
> > over 16 times the memory bandwidth available today than we did just 5
years
> > ago.  Available memory bandwidth has been growing more quickly than
> > processor clockspeed lately, and I do not foresee an end to this any
time
> > soon.
>
> OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed.

No, in this 5 year period, processor clockspeed has moved from approximately
200MHz to over 2GHz.  This is a factor of 10 in CPU growth and 16 in memory
bandwidth.  Memory bandwidth is growing more quickly than processor
clockspeed now.


> > Overutilised in my opinion.  The amount of overdraw performed by today's
> > video cards on modern games and applications is incredible.  Immediate
mode
> > rendering is an inefficient algorithm.  Video cards tend to have
extremely
> > well optimized implementations of this inefficient algorithm.
>
> That's because games *NEED* to do lots of overdraw.  They are actually

The games perform overdraw, sure.  But I am talking about at the pixel
level.  A scene-capture algorithm performs 0 overdraw, regardless of what
the game sends it.  This reduces fillrate needs greatly.


> > Kyro-based video cards perform quite well.  They are not quite up to the
> > level of nVidia's latest cards...
>
> Not *quite*!!! Their best card is significantly slower than
> a GeForce 2MX - that's four generations of nVidia technology
> ago.

This has nothing to do with the algorithms itself.  It merely has to do with
the company's ability to scale its hardware.  A software implementation
would not be limited in this manner.  It could take advantage of the
processor manufacturer's ability to scale speeds much more easily.


> Also, in order to use scene capture, you are reliant on the underlying
> graphics API to be supportive of this technique.  Neither OpenGL nor
> Direct3D are terribly helpful.

Kyro-based 'scene-capture' video cards support both Direct3D and OpenGL.
Any game you can play using an nVidia card you can also play using a
Kyro-based card.


> > > Everything that is speeding up the main CPU is also speeding up
> > > the graphics processor - faster silicon, faster busses and faster
> > > RAM all help the graphics just as much as they help the CPU.
> >
> > Everything starts out in hardware and eventually moves to software.
>
> That's odd - I see the reverse happening.  First we had software

The move from hardware to software is an industry-wide pattern for all
technology.  It saves money.  3D video cards have been implementing new
technologies that were never used in software before.  Once the main
processor is able to handle these things, they will be moved into software.
This is just a fact of life in the computing industry.  Take a look at what
they did with "Winmodems".  They removed hardware and wrote drivers to
perform the tasks.  The same thing will eventually happen in the 3D card
industry.


> As CPU's get faster, graphics cards get *MUCH* faster.

This has mostly to do with memory bandwidth.  The processors on the video
cards are not all that impressive by themselves.  Memory bandwidth available
to the CPU is increasing rapidly.


> CPU's aren't "catching up" - they are getting left behind.

I disagree.  What the CPU lacks in hardware units it makes up with sheer
clockspeed.  A video card may be able to perform 10 times as many operations
per clock cycle as a CPU.  But if that CPU is operating at over 10 times the
clockspeed, who cares?  It will eventually be faster.  Video card
manufactures cannot scale clockspeed anywhere near as well as Intel.


> They are adding in steps to the graphics processing that are programmable.

And this introduces the same problems that the main CPU is much better at
dealing with.  Branch prediction and other software issues have been highly

Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-04 Thread Stephen J Baker

On Tue, 2 Apr 2002, Raystonn wrote:

> > That is far from the truth - they have internal pipelining
> > and parallelism.  Their use of silicon can be optimised to balance
> > the performance of just one single algorithm.  You can never do that
> > for a machine that also has to run an OS, word process and run
> > spreadsheets.
>
> Modern processors have internal pipelining and parallelism as well.

Yes - and yet they still have horrible problems every time you have
a conditional branch instruction.  That's because they are trying
to convert a highly linear operation (code execution) into some
kind of a parallel form.  Graphics is easier though.  Each pixel and
each polygon can be treated as a stand-alone entity and can be
processed in true parallelism.

>  Most of
> the processing power of today's CPUs go completely unused.  It is possible
> to create optimized implementations using Single-Instruction-Multiple-Data
> (SIMD) instructions of efficient algorithms.

Which is a way of saying "Yes, you could do fast graphics on the CPU
if you put the GPU circuitry onto the CPU chip and pretend that it's
now part of the core CPU".

I'll grant you *that* - but it's not the same thing as doing the
graphics in software.

> > Since 1989, CPU speed has grown by a factor of 70.  Over the same
> > period the memory bus has increased by a factor of maybe 6 or so.
>
> We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM)
> to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years.  We have
> over 16 times the memory bandwidth available today than we did just 5 years
> ago.  Available memory bandwidth has been growing more quickly than
> processor clockspeed lately, and I do not foresee an end to this any time
> soon.

OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed.
My argument remains - and remember that whenever RAM gets faster,
so do the graphics cards.  You can run faster - but you can't catch up
if the other guy is also running faster.

> > On the other hand, the graphics card can use heavily pipelined
> > operations to guarantee that the memory bandwidth is 100% utilised
>
> Overutilised in my opinion.  The amount of overdraw performed by today's
> video cards on modern games and applications is incredible.  Immediate mode
> rendering is an inefficient algorithm.  Video cards tend to have extremely
> well optimized implementations of this inefficient algorithm.

That's because games *NEED* to do lots of overdraw.  They are actually
pretty smart about eliminating the 'obvious' cases by doing things
like portal culling.  Most of the overdraw comes from needing to do
multipass rendering (IIRC, the new Return To Castle Wolfenstien game
uses up to 12 passes to render some polygons).  The overdraw due to
that kind of thing is rather harder to eliminate with algorithmic
sophistication.  If you need that kind of surface quality, your
bandwidth out of memory will be high no matter what.

> Kyro-based video cards perform quite well.  They are not quite up to the
> level of nVidia's latest cards...

Not *quite*!!! Their best card is significantly slower than
a GeForce 2MX - that's four generations of nVidia technology
ago.

I agree that if this algorithm were to be implemented on a card
with the *other* capabilities of an nVidia card - then it would improve
the fill rate by perhaps a factor of two or four. (Before you argue
about that - realise that I've designed *and* built hardware and software
using this technology - and I've MEASURED it's performance for 'typical'
scenes).

But you can only draw scenes where the number of polygons being rendered
can fit into the 'scene capture' buffer.  And that's the problem with
that technology.

If I want to draw a scene with a couple of million polygons in it (perfectly
possible with modern cards) then those couple of million polygons have
to be STORED ON THE GRAPHICS CARD.  That's a big problem for an affordable
graphics card.

Adding another 128Mb of fast RAM to store the scene in costs a lot more
than doubling the amount of processing power on the GPU.  The amount of
RAM on the chip becomes a major cost driver for a $120 card.

None of those issues affect a software solution though - and it's
possible that a scene capture solution *could* be better than a
conventional immediate mode renderer - but I still think that
it will at MOST only buy you a factor of 2x or 4x pixel rate speedup
and you have a MUCH larger gap than that to hurdle.

Also, in order to use scene capture, you are reliant on the underlying
graphics API to be supportive of this technique.  Neither OpenGL nor
Direct3D are terribly helpful.  You can write things like:

   Render 100 polygons.
   Read back the image they created.

   if the pixel at (123,456) is purple then
   {
 put that image into texture memory.

 Render another 100 polygons using the texture
 you just created.
   }

...scene capture algorithms have a very hard time with things like
that becau

Re: [Dri-devel] Mesa software blending

2002-04-04 Thread Sergey V. Udaltsov

> > Does 4 do pixel-based fog?
> yep.
So in some cases it is much slower than 3, isn't it?

> It's because they are quite similar operations so they use the same chip 
> logic. In fact you have a bit to choose wether you want alpha or fog. It's 
> was design option.
So they did not want to have two instances of the same logic on a single
chip to implement this "rare" combination, wont they?

> Don't know, but a transparent window in a foggy level is not a situation 
> very hard to happen...
I see.

> In this case the visual difference can be very big...
Sad. Shame to ATI!:)

Thanks for all these clarifications. They're really interesting matters
for me.

Sergey

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Mesa software blending

2002-04-04 Thread José Fonseca

On 2002.04.04 09:44 Sergey V. Udaltsov wrote:
> > But the OpenGL spec says that the fog color is calculated on a _pixel_
> > basis and not on a _vertex_ basis. Indeed the result is different,
> > especially in long polygons that span from the front way to the back.
> Does 4 do pixel-based fog?
> 

yep.

> > Mach64 is able to do the fog properly, i.e., on a pixel basis, but
> _not_
> Why? I know - it's only ATI who can answer this question...:)

It's because they are quite similar operations so they use the same chip 
logic. In fact you have a bit to choose wether you want alpha or fog. It's 
was design option.

> > when alpha blending since it uses the path on chip. So the problem is
> only
> > what to do when both fog and alpha blending are enabled.
> Are there many apps using this effects together?
> 

Don't know, but a transparent window in a foggy level is not a situation 
very hard to happen...

> > The solution of using these depending of the contents of a env var is a
> > compromise so that gamers achieve a better gameplay sacrifying a little
> 
> > the visual quality and the OpenGL conformance.
> Actually, end users in 80% (or 99%?) do not specially care about
> conformance. The visual quality really matters.
> 
> > There are other situations as this one. Leif checked on Unreal and
> there
> > is one (also when alpha blending) that happens and according with his
> > experiments reverting to software leads to a severe performance hit.
> It was predictable, wasn't it? And any predictions about _visual_
> difference between these two methods? Will users see the difference
> easily? Say, if you get 10* speedup with 5% worse quality (I do not
> really know how to measure it though:) - almost nobody will really use
> SW mode.
> 
In this case the visual difference can be very big...

> ...
> 
> Cheers,
> 
> Sergey
> 

José Fonseca

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Mesa software blending

2002-04-04 Thread José Fonseca

On 2002.04.04 09:08 Keith Whitwell wrote:
> On Thu, 4 Apr 2002 01:56:22 +0100
> José Fonseca <[EMAIL PROTECTED]> wrote:
> 
> > ...
> 
> > the further away is the vertex more its color is nearer to the fog
> > background color. Since the colors are interpolated in the triangle
> this
> > gave the impression of fog.
> >
> > But the OpenGL spec says that the fog color is calculated on a _pixel_
> > basis and not on a _vertex_ basis. Indeed the result is different,
> > especially in long polygons that span from the front way to the back.
> 
> 
> Actually I think it gives you scope to do either.

Well the Opengl 1.3, sec 3.10 says: "Further, f need not be computed at 
each fragment, but may be computed at each vertex and interpolated as 
other data are.", but it also says "If enabled, fog blends a fog color 
with a rasterized fragment s post-texturing color using a blending factor 
f." Since it's applied after on the pipeline I'm not sure that the results 
will be the same, even without texturing..

> The issue with adding
> fog into the vertex colors is what happens when texturing is turned on -
> the vertex color may not even contribute to the color of the generated
> fragments.

I see, it depends of the texture environment... so unless there is a way 
around it we will have to fallback to software on these no matter what.

> 
> Keith
> 

José Fonseca

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Mesa software blending

2002-04-04 Thread Sergey V. Udaltsov

> But the OpenGL spec says that the fog color is calculated on a _pixel_ 
> basis and not on a _vertex_ basis. Indeed the result is different, 
> especially in long polygons that span from the front way to the back.
Does 4 do pixel-based fog?

> Mach64 is able to do the fog properly, i.e., on a pixel basis, but _not_ 
Why? I know - it's only ATI who can answer this question...:)
> when alpha blending since it uses the path on chip. So the problem is only 
> what to do when both fog and alpha blending are enabled.
Are there many apps using this effects together?

> The solution of using these depending of the contents of a env var is a 
> compromise so that gamers achieve a better gameplay sacrifying a little 
> the visual quality and the OpenGL conformance.
Actually, end users in 80% (or 99%?) do not specially care about
conformance. The visual quality really matters. 

> There are other situations as this one. Leif checked on Unreal and there 
> is one (also when alpha blending) that happens and according with his 
> experiments reverting to software leads to a severe performance hit.  
It was predictable, wasn't it? And any predictions about _visual_
difference between these two methods? Will users see the difference
easily? Say, if you get 10* speedup with 5% worse quality (I do not
really know how to measure it though:) - almost nobody will really use
SW mode.

> > Cool! And the default version will be HW-based non-conformant, won't it?
> This is very subjective, but if we assume that DRI aims to be OpenGL 
> conformant, I vote for sw-based conformant...
:)) I see your point. Again "real life vs standards":). It's time for
new poll on dri.sourceforge.net:)

> > BTW, I can seriously recommend 3ddesktop as a test tool. It supports
> > several effects (blending, textures, etc. with on/off switching) so its
> > behavior could give a lot of hints to the developers.
> I'll check it.
Thanks a lot. I'm not lobbying it. I just like it - and would like it to
work properly on Mach64 - today I have a lot of problems with it - which
I don't have in pure SW mode.

Cheers,

Sergey

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Mesa software blending

2002-04-03 Thread Keith Whitwell

On Thu, 4 Apr 2002 01:56:22 +0100
José Fonseca <[EMAIL PROTECTED]> wrote:

> On 2002.04.03 23:50 Sergey V. Udaltsov wrote:
> > > He!He!.. you missed!! It's a mix of variant 1 and 2..! :)
> > Cool. At least 1+2 is the answer (call it 5). Thanks.
> > 
> > > As Leif previously said is his reply, it can be done in hardware by
> > > messing the colors of the vertex to incorporate the fog (as software
> > Mesa
> > > used to do in 3.x) but it's non-conformant to the OpenGL spec.
> > What's the difference in Mesa 4? Is it better? Can it be done in HW?
> > Will this non-conformance cause visible problems on display?
> 
> In Mesa 3.x when fog was enabled the vertex colors were changed so that 
> the further away is the vertex more its color is nearer to the fog 
> background color. Since the colors are interpolated in the triangle this 
> gave the impression of fog.
> 
> But the OpenGL spec says that the fog color is calculated on a _pixel_ 
> basis and not on a _vertex_ basis. Indeed the result is different, 
> especially in long polygons that span from the front way to the back.


Actually I think it gives you scope to do either.  The issue with adding fog into the 
vertex colors is what happens when texturing is turned on - the vertex color may not 
even contribute to the color of the generated fragments.

Keith

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Mesa software blending

2002-04-03 Thread José Fonseca

On 2002.04.03 23:50 Sergey V. Udaltsov wrote:
> > He!He!.. you missed!! It's a mix of variant 1 and 2..! :)
> Cool. At least 1+2 is the answer (call it 5). Thanks.
> 
> > As Leif previously said is his reply, it can be done in hardware by
> > messing the colors of the vertex to incorporate the fog (as software
> Mesa
> > used to do in 3.x) but it's non-conformant to the OpenGL spec.
> What's the difference in Mesa 4? Is it better? Can it be done in HW?
> Will this non-conformance cause visible problems on display?

In Mesa 3.x when fog was enabled the vertex colors were changed so that 
the further away is the vertex more its color is nearer to the fog 
background color. Since the colors are interpolated in the triangle this 
gave the impression of fog.

But the OpenGL spec says that the fog color is calculated on a _pixel_ 
basis and not on a _vertex_ basis. Indeed the result is different, 
especially in long polygons that span from the front way to the back.

Mach64 is able to do the fog properly, i.e., on a pixel basis, but _not_ 
when alpha blending since it uses the path on chip. So the problem is only 
what to do when both fog and alpha blending are enabled.

The solution of using these depending of the contents of a env var is a 
compromise so that gamers achieve a better gameplay sacrifying a little 
the visual quality and the OpenGL conformance.

There are other situations as this one. Leif checked on Unreal and there 
is one (also when alpha blending) that happens and according with his 
experiments reverting to software leads to a severe performance hit.  
> 
> > So the solution found is to do it either like this or by software
> > depending of the value of a environment var.
> Cool! And the default version will be HW-based non-conformant, won't it?

This is very subjective, but if we assume that DRI aims to be OpenGL 
conformant, I vote for sw-based conformant...

> ...
> 
> BTW, I can seriously recommend 3ddesktop as a test tool. It supports
> several effects (blending, textures, etc. with on/off switching) so its
> behavior could give a lot of hints to the developers.

I'll check it.

> 
> Cheers,
> 
> Sergey
> 

Regards,

José Fonseca

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Mesa software blending

2002-04-03 Thread Sergey V. Udaltsov

> He!He!.. you missed!! It's a mix of variant 1 and 2..! :)
Cool. At least 1+2 is the answer (call it 5). Thanks.

> As Leif previously said is his reply, it can be done in hardware by 
> messing the colors of the vertex to incorporate the fog (as software Mesa 
> used to do in 3.x) but it's non-conformant to the OpenGL spec.
What's the difference in Mesa 4? Is it better? Can it be done in HW?
Will this non-conformance cause visible problems on display?

> So the solution found is to do it either like this or by software 
> depending of the value of a environment var.
Cool! And the default version will be HW-based non-conformant, won't it?

> As you can guess implementing this is not a high priority.
I see.

> Well, CPU power does matter if the user is inclined to choose the 
> conformant way and do software fallback.
If it's just one envvar - this solution will satisfy any user (except
for one who want GeForce performance for the price of Mach64:).

> The fact that the mach64 driver needs several software fallbacks to be 
> really OpenGL conformant is one of the reasons for my interest in 
> improving the Mesa sw rendering. (Which I must correct you, was the _real_ 
> start of the thread :)
You're right. I missed the start. And quality SW rendering is still very
important issue.

BTW, I can seriously recommend 3ddesktop as a test tool. It supports
several effects (blending, textures, etc. with on/off switching) so its
behavior could give a lot of hints to the developers.

Cheers,

Sergey

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Mesa software blending

2002-04-03 Thread José Fonseca

On 2002.04.03 14:43 Sergey V. Udaltsov wrote:
> After all these interesting and informative discussions, everyone has
> forgotten the start of the thread:) Basically, there should one answer
> to the question whether and how "blending+fog" can be implemented .

> Possible variants:
> 1. Yes, it can be done with hardware acceleration. DRI team knows how.
> 2. No, it cannot be done. DRI team knows why.
> 3. Possibly yes but actually no. ATI knows but DRI will never know (NDA
> issues)
> 4. Possibly yes but now DRI team does not know exactly how...

He!He!.. you missed!! It's a mix of variant 1 and 2..! :)

> Which answer is the correct one? After this question is answered, we
> (users/testers) could get the idea whether we'll finally have HW
> implementation or DRI will use indirect rendering here.

As Leif previously said is his reply, it can be done in hardware by 
messing the colors of the vertex to incorporate the fog (as software Mesa 
used to do in 3.x) but it's non-conformant to the OpenGL spec.

So the solution found is to do it either like this or by software 
depending of the value of a environment var.

As you can guess implementing this is not a high priority.

> Actually, here is the point where CPU power does not really matter. What
> really matters is "possible"/"impossible" and "know"/"don't know" (sure,
> add "want/don't want":)
> 

Well, CPU power does matter if the user is inclined to choose the 
conformant way and do software fallback.

The fact that the mach64 driver needs several software fallbacks to be 
really OpenGL conformant is one of the reasons for my interest in 
improving the Mesa sw rendering. (Which I must correct you, was the _real_ 
start of the thread :)

> BTW, just tested mach64 driver with 3ddesktop. _Really_ fast but there
> is a lot of artefacts and incorrect rendering. Probably, it's buggy app
> (version 0.1.2:) but not necessarily. Could anyone please try and
> comment an experience?
> 
> Cheers,
> 
> Sergey
> 

Regards,

José Fonseca

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-03 Thread Sergey V. Udaltsov

After all these interesting and informative discussions, everyone has
forgotten the start of the thread:) Basically, there should one answer
to the question whether and how "blending+fog" can be implemented .
Possible variants:
1. Yes, it can be done with hardware acceleration. DRI team knows how.
2. No, it cannot be done. DRI team knows why.
3. Possibly yes but actually no. ATI knows but DRI will never know (NDA
issues)
4. Possibly yes but now DRI team does not know exactly how...
Which answer is the correct one? After this question is answered, we
(users/testers) could get the idea whether we'll finally have HW
implementation or DRI will use indirect rendering here. 
Actually, here is the point where CPU power does not really matter. What
really matters is "possible"/"impossible" and "know"/"don't know" (sure,
add "want/don't want":)

BTW, just tested mach64 driver with 3ddesktop. _Really_ fast but there
is a lot of artefacts and incorrect rendering. Probably, it's buggy app
(version 0.1.2:) but not necessarily. Could anyone please try and
comment an experience?

Cheers,

Sergey

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-03 Thread Raystonn

> >  The only thing video
> > cards have today that is really better than the main processor is
massive
> > amounts of memory bandwidth.
>
> That is far from the truth - they have internal pipelining
> and parallelism.  Their use of silicon can be optimised to balance
> the performance of just one single algorithm.  You can never do that
> for a machine that also has to run an OS, word process and run
> spreadsheets.

Modern processors have internal pipelining and parallelism as well.  Most of
the processing power of today's CPUs go completely unused.  It is possible
to create optimized implementations using Single-Instruction-Multiple-Data
(SIMD) instructions of efficient algorithms.


> >  Since memory bandwidth is increasing rapidly,...
>
> It is?!?  Let's look at the facts:
>
> Since 1989, CPU speed has grown by a factor of 70.  Over the same
> period the memory bus has increased by a factor of maybe 6 or so.

We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM)
to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years.  We have
over 16 times the memory bandwidth available today than we did just 5 years
ago.  Available memory bandwidth has been growing more quickly than
processor clockspeed lately, and I do not foresee an end to this any time
soon.


> On the other hand, the graphics card can use heavily pipelined
> operations to guarantee that the memory bandwidth is 100% utilised

Overutilised in my opinion.  The amount of overdraw performed by today's
video cards on modern games and applications is incredible.  Immediate mode
rendering is an inefficient algorithm.  Video cards tend to have extremely
well optimized implementations of this inefficient algorithm.


> - and can use an arbitarily large amount of parallelism to improve
> throughput.  The main CPU can't do that because it's memory access
> patterns are not regular and it has little idea where the next byte
> has to be read from until it's too late.

Modern processors have a considerable amount of parallelism built in.  With
advanced prefetch and streaming SIMD instructions it is very possible to do
these types of operations in a modern processor.  It will, however, take
another couple of years to be able to render at great framerates and high
resolutions.


> You only have to look at the gap you are trying to bridge - a
> modern graphics card is *easily* 100 times faster at rendering
> sophisticated pixels (with pixel shaders, multiple textures and
> antialiasing) than the CPU.

They are limited in what they can do.  In order to allow more flexibility
they have recently introduced pixel shaders, which basically turns the video
card into a mini-CPU.  Modern processors can perform these features more
quickly and would allow an order of magnitude more flexibility in what can
be done.


> > A properly
> > implemented and optimized software version of a tile-based
"scene-capture"
> > renderer much like that used in Kyro could perform as well as the latest
> > video cards in a year or two.  This is what I am dabbling with at the
> > moment.
>
> I await this with interest - but 'scene capture' systems tend to be
> unusable with modern graphics API's...they can't run either OpenGL
> or Direct3D efficiently for arbitary input.  If there were to be
> some change in consumer needs that would result in 'scene capture'
> being a usable technique - then the graphics cards can easily take
> that on board and will *STILL* beat the heck out of doing it in
> the CPU.  Scene capture is also only feasible if the number of
> polygons being rendered is small and bounded - the trends are
> for modern graphics software to generate VAST numbers of polygons
> on-the-fly precisely so they don't have to be stored in slow old
> memory.

Kyro-based video cards perform quite well.  They are not quite up to the
level of nVidia's latest cards but this is new technology and is being
worked on by a relatively new company.  These cards do not require nearly as
much memory bandwidth as immediate-mode renderers, performing 0 overdraw.
They are more processing intensive rather than being bandwidth intensive.  I
see this as a more efficient algorithm.


> Everything that is speeding up the main CPU is also speeding up
> the graphics processor - faster silicon, faster busses and faster
> RAM all help the graphics just as much as they help the CPU.

Everything starts out in hardware and eventually moves to software.  There
will come a time when the basic functionality provided by video cards can be
easily done by a main processor.  The extra features offered by the video
cards, such as pixel shaders, are simply attempts to stand-in as a main
processor.  Once the basic functionality of the video card can be performed
by the main system procsesor, there will really be no need for extra
hardware to perform these tasks.  What I see now is a move by the video card
companies to software-based solutions (pixel shaders, etc.)  They have
recognized that there 

Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Raystonn

I am definately all for increasing the performance of the software renderer.
Eventually the main system processor will be fast enough to perform all of
this without the need for a third party graphics card.  The only thing video
cards have today that is really better than the main processor is massive
amounts of memory bandwidth.  Since memory bandwidth is increasing rapidly,
I foresee the need for video cards lessening in the future.  A properly
implemented and optimized software version of a tile-based "scene-capture"
renderer much like that used in Kyro could perform as well as the latest
video cards in a year or two.  This is what I am dabbling with at the
moment.

-Raystonn


- Original Message -
From: "Brian Paul" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, April 01, 2002 6:36 AM
Subject: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending


"José Fonseca" wrote:
>
> In these last few days I have been working on the Mesa software blending
> and the existing MMX bug. I've made some progress.
>
> I made a small test program which calls the relevant functions directly as
> Alex suggested. In the process I added comments to the assembly code
> (which had none). The error is due to the fact that the inner loop blends
> two pixels at the same time, so if the mask of the first element is zero
> then both are skipped. I also spotted some errors in the runin section,
> e.g., it ANDs with 4 and compares the result with 8 which is impossible...
> I still have to study the x86 architecture optimization a little further
> to know how to optimally fix both these situations.
>
> I also made two optimizations in blend_transparency(s_blend.c) which have
> no effect in the result precision but that achieved a global speedup of
> 30% in the function. These optimizations are in the C code and benefit all
> architectures.
>
> The first was to avoid the repetition of the input variable in the DIV255.
> At least my version of gcc (2.96) wasn't factoring the common code out
> yelding to a 17% speedup.
>
> The second was to factor the equation of blending reducing in half the
> number of multiplications. This optimization can be applied in other
> places on this file as well.

Good work.  I'll review your changes and probably apply it to the Mesa trunk
(for version 4.1) later today.


> A third optimization that I'll try is the "double blend" trick (make two
> 8-bit multiplications at the same time in a 32-bit register) as documented
> by Michael Herf (http://www.stereopsis.com/doubleblend.html - a quite
> interesting site referred to me by Brian).

I was going to do that someday too.  Go for it.


> I would like to keep improving Mesa software rendering performance. I know
> that due to its versatility and power Mesa will never rival with a
> dedicated and non-conformant software 3d engine such as unreal one,
> nevertheless I think that it's possible to make it usefull for simple
> realtime rendering. Regards,

Despite the proliferation of 3D hardware, there'll always be applications
for software rendering.  For example, the 16-bit color channel features is
being used by several animation houses.

-Brian

___
Mesa3d-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Will Newton


> There is no sign whatever that CPU's are "catching up" with
> graphics cards - and no logical reason why they ever will.

It could however be argued that CPUs are "catching up" with the needs of a 
certain level of user. Not the hardcore gamer, but quite possibly the 
hobbyist 3D artist or 3D freeware game player. A different argument, but IMO 
are more important one.

Either way, I don't see anyone arguing we should not be improving the 
software renderer. :)

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Stephen J Baker


Gack!  I'm *so* sick of hearing this argument...

On Tue, 2 Apr 2002, Raystonn wrote:

> I am definately all for increasing the performance of the software renderer.

Yes.

> Eventually the main system processor will be fast enough to perform all of
> this without the need for a third party graphics card.

I very much doubt this will happen within the lifetime of silicon chip
technology.  Maybe with nanotech, biological or quantum computing - but
probably not even then.

>  The only thing video
> cards have today that is really better than the main processor is massive
> amounts of memory bandwidth.

That is far from the truth - they have internal pipelining
and parallelism.  Their use of silicon can be optimised to balance
the performance of just one single algorithm.  You can never do that
for a machine that also has to run an OS, word process and run
spreadsheets.

>  Since memory bandwidth is increasing rapidly,...

It is?!?  Let's look at the facts:

Since 1989, CPU speed has grown by a factor of 70.  Over the same
period the memory bus has increased by a factor of maybe 6 or so.

Caching can go some way to hiding that - but not for things like
graphics that need massive frame buffers and huge texture maps.
Caching also makes parallelism difficult and rendering algorithms
are highly parallelisable.  PC's are *horribly* memory-bound.

> I foresee the need for video cards lessening in the future.

Whilst memory bandwidth inside the main PC is increasing, it's doing
so very slowly - and all the tricks it uses to get that speedup are equally
applicable to the graphics hardware (things like DDR for example).

On the other hand, the graphics card can use heavily pipelined
operations to guarantee that the memory bandwidth is 100% utilised
- and can use an arbitarily large amount of parallelism to improve
throughput.  The main CPU can't do that because it's memory access
patterns are not regular and it has little idea where the next byte
has to be read from until it's too late.

Also, the instruction set of the main CPU isn't optimised for the
rendering task - where that is the ONLY thing the graphics chip
has to do.  The main CPU has all this legacy crap to deal with because
it's expected to run programs that were written 20 years ago.
Every generation of graphics chip can have a totally redesigned
internal architecture that exactly fits the profile of today's
RAM and silicon speeds.

You only have to look at the gap you are trying to bridge - a
modern graphics card is *easily* 100 times faster at rendering
sophisticated pixels (with pixel shaders, multiple textures and
antialiasing) than the CPU.

> A properly
> implemented and optimized software version of a tile-based "scene-capture"
> renderer much like that used in Kyro could perform as well as the latest
> video cards in a year or two.  This is what I am dabbling with at the
> moment.

I await this with interest - but 'scene capture' systems tend to be
unusable with modern graphics API's...they can't run either OpenGL
or Direct3D efficiently for arbitary input.  If there were to be
some change in consumer needs that would result in 'scene capture'
being a usable technique - then the graphics cards can easily take
that on board and will *STILL* beat the heck out of doing it in
the CPU.  Scene capture is also only feasible if the number of
polygons being rendered is small and bounded - the trends are
for modern graphics software to generate VAST numbers of polygons
on-the-fly precisely so they don't have to be stored in slow old
memory.

Everything that is speeding up the main CPU is also speeding up
the graphics processor - faster silicon, faster busses and faster
RAM all help the graphics just as much as they help the CPU.

However, increasing the number of transistors you can have on
a chip doesn't help the CPU out very much.  Their instruction
sets are not getting more complex in proportion to the increase
in silicon area - and their ability to make use of more complex
instructions is already limited by the brain power of compiler
writers.  Most of the speedup in modern CPU's is coming from
physically shorter distances for signals to travel and faster
clocks - all of the extra gates typically end up increasing the
size of the on-chip cache which has marginal benefits to graphics
algorithms.

In contrast to that, a graphics chip designer can just double
the number of pixel processors or something and get an almost
linear increase in performance with chip area with relatively
little design effort and no software changes.

If you doubt this, look at the progress over the last 5 or 6
years.  In late 1996 the Voodoo-1 had a 50Mpixel/sec fill rate.
In 2002 GeForce-4 has a fill rate of 4.8 Billion (antialiased)
pixels/sec - it's 100 times faster.  Over the same period,
your 1996 233MHz CPU has gone up to a 2GHz machine ...a mere
10x speedup.  The graphics cards are also gaining features.
Over that same period, they added - windowing, hardware T&L,
antiali

RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Alexander Stohr

Hello Raystonn,

sorry, but a dedicated ASIC hardware is always faster.
(you are a troll, arent you?)

in the straight forward OpenGL case (flat and smooth shading)
you can turn on several features in the pixel path and in
the geometry pipeline (culling, 8x lighting, clipping) 
that you wont be able to perform at the same speed with 
a normal CPU setup. Its not only the bandwidth, its the
floating point performance which the grafics chips are
capable of by the meance of multiple parallel and dedicated
FPU units.

For the pixel path, when (multi) texturing is enabled or alpha blending
or fogging or somtehing else that does readback (stencil buffer,
depth buffer dependent operations, anit aliased lines) then
you will spot that a classical CPU and processor system will not
perform at its best if doing pixel manipulations of that sort.

I think a regular grafics hardware can clean up your framebuffer
in a fraction of the time, that a cpu-mainboard pairing can do.
Thats the case since the good old IBM VGA from ages ago.

And dont tell me an UMA architecture is better in that case. 
You first have to accept that the RAM DAC is time sharing the
same bus system and therefore it permanently consumes bus cycles.
But if rasterisation has separate memory with an option for a
wider bus, separate chaces and higher clocked memory you will 
get better performance by design.

Regards, Alex.


-Original Message-
From: Raystonn [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, April 02, 2002 19:45
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending


[Resending, fell into last night's black hole it seems.]

I am definately all for increasing the performance of the software renderer.
Eventually the main system processor will be fast enough to perform all of
this without the need for a third party graphics card.  The only thing video
cards have today that is really better than the main processor is massive
amounts of memory bandwidth.  Since memory bandwidth is increasing rapidly,
I foresee the need for video cards lessening in the future.  A properly
implemented and optimized software version of a tile-based "scene-capture"
renderer much like that used in Kyro could perform as well as the latest
video cards in a year or two.  This is what I am dabbling with at the
moment.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Alexander Stohr

> > > I don't think so.  I haven't noticed a problem with fog 
> in the tunnel demo.
> > So it works for you, doesn't it? Envious.
> > For me, the fog effect does not work. Some time ago, someone (Jose?)
> > even explained that is should not work on mach64 (alpha 
> blending + some
> > other effect?) So my question was whether it should work now or not.
> 
> No, this won't fix the problem.  Mach64 can't do fog and 
> blending at the
> same time, and the tunnel demo uses blending for the menu.  
> There was some
> discussion of trying to use software fogging per-vertex when hardware
> blending is enabled, but no one has implemented it yet.

Jose was working on some MMX code that was currently disabled
in the Mesa source due to bugs in the coding. So he fixed problems 
that could not come into effect for your case. With that fix 
you might spot some speedup with an MMX capable CPU 
if you are running specific mesa demos on it.

Concerning the tunnel demo. As long as fogging is not required
(at least i think it is not) for the rendering of the alpha blended 
help texts and the other informatinal texts, it would be the best
if you just disable fogging for drawing these elements. Consider
that mode turn-off as a fix for some sub optimal application coding.

(I should have a look at that source and 
check if or why its not alredy done in that demo...)

Regards, Alex.


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Brian Paul

> Raystonn wrote:
> 
> [Resending, fell into last night's black hole it seems.]
> 
> I am definately all for increasing the performance of the software renderer.
> Eventually the main system processor will be fast enough to perform all of
> this without the need for a third party graphics card.  The only thing video
> cards have today that is really better than the main processor is massive
> amounts of memory bandwidth.  Since memory bandwidth is increasing rapidly,
> I foresee the need for video cards lessening in the future.  A properly
> implemented and optimized software version of a tile-based "scene-capture"
> renderer much like that used in Kyro could perform as well as the latest
> video cards in a year or two.  This is what I am dabbling with at the
> moment.

That's debatable.

My personal opinion is that special-purpose graphics hardware will
always perform better than a general-purpose CPU.  The graphics pipeline
is amenable to very specialized optimizations (both in computation and
the memory system) that aren't applicable to a general purpose CPU.

Of course, looking far enough into the future, all bets are off.

-Brian

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Raystonn



[Resending, fell into last night's black hole it seems.]
 
I am definately all for increasing the performance of the software 
renderer.Eventually the main system processor will be fast enough to perform 
all ofthis without the need for a third party graphics card.  The only 
thing videocards have today that is really better than the main processor is 
massiveamounts of memory bandwidth.  Since memory bandwidth is 
increasing rapidly,I foresee the need for video cards lessening in the 
future.  A properlyimplemented and optimized software version of a 
tile-based "scene-capture"renderer much like that used in Kyro could perform 
as well as the latestvideo cards in a year or two.  This is what I am 
dabbling with at themoment.-Raystonn- Original 
Message -From: "Brian Paul" <[EMAIL PROTECTED]>To: 
<[EMAIL PROTECTED]>Cc: 
<[EMAIL PROTECTED]>Sent: 
Monday, April 01, 2002 6:36 AMSubject: [Mesa3d-dev] Re: [Dri-devel] Mesa 
software blending"José Fonseca" wrote:>> In these last 
few days I have been working on the Mesa software blending> and the 
existing MMX bug. I've made some progress.>> I made a small test 
program which calls the relevant functions directly as> Alex suggested. 
In the process I added comments to the assembly code> (which had none). 
The error is due to the fact that the inner loop blends> two pixels at 
the same time, so if the mask of the first element is zero> then both are 
skipped. I also spotted some errors in the runin section,> e.g., it ANDs 
with 4 and compares the result with 8 which is impossible...> I still 
have to study the x86 architecture optimization a little further> to know 
how to optimally fix both these situations.>> I also made two 
optimizations in blend_transparency(s_blend.c) which have> no effect in 
the result precision but that achieved a global speedup of> 30% in the 
function. These optimizations are in the C code and benefit all> 
architectures.>> The first was to avoid the repetition of the 
input variable in the DIV255.> At least my version of gcc (2.96) wasn't 
factoring the common code out> yelding to a 17% speedup.>> 
The second was to factor the equation of blending reducing in half the> 
number of multiplications. This optimization can be applied in other> 
places on this file as well.Good work.  I'll review your changes 
and probably apply it to the Mesa trunk(for version 4.1) later 
today.> A third optimization that I'll try is the "double blend" 
trick (make two> 8-bit multiplications at the same time in a 32-bit 
register) as documented> by Michael Herf (http://www.stereopsis.com/doubleblend.html 
- a quite> interesting site referred to me by Brian).I was going 
to do that someday too.  Go for it.> I would like to keep 
improving Mesa software rendering performance. I know> that due to its 
versatility and power Mesa will never rival with a> dedicated and 
non-conformant software 3d engine such as unreal one,> nevertheless I 
think that it's possible to make it usefull for simple> realtime 
rendering. Regards,Despite the proliferation of 3D hardware, there'll 
always be applicationsfor software rendering.  For example, the 16-bit 
color channel features isbeing used by several animation 
houses.-Brian___Mesa3d-dev 
mailing list[EMAIL PROTECTED]https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Leif Delgass

On 2 Apr 2002, Sergey V. Udaltsov wrote:

> > I don't think so.  I haven't noticed a problem with fog in the tunnel demo.
> So it works for you, doesn't it? Envious.
> For me, the fog effect does not work. Some time ago, someone (Jose?)
> even explained that is should not work on mach64 (alpha blending + some
> other effect?) So my question was whether it should work now or not.

No, this won't fix the problem.  Mach64 can't do fog and blending at the
same time, and the tunnel demo uses blending for the menu.  There was some
discussion of trying to use software fogging per-vertex when hardware
blending is enabled, but no one has implemented it yet.

-- 
Leif Delgass 
http://www.retinalburn.net


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Brian Paul

"Sergey V. Udaltsov" wrote:
> 
> > I don't think so.  I haven't noticed a problem with fog in the tunnel demo.
> So it works for you, doesn't it? Envious.
> For me, the fog effect does not work. Some time ago, someone (Jose?)
> even explained that is should not work on mach64 (alpha blending + some
> other effect?) So my question was whether it should work now or not.

You didn't say anything about Mach64 in your original message - I assumed
you were talking about software rendering/blending.  I haven't tried the
Mach64 branch yet.

-Brian

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Sergey V. Udaltsov

> I don't think so.  I haven't noticed a problem with fog in the tunnel demo.
So it works for you, doesn't it? Envious.
For me, the fog effect does not work. Some time ago, someone (Jose?)
even explained that is should not work on mach64 (alpha blending + some
other effect?) So my question was whether it should work now or not.

Cheers,

Sergey

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Brian Paul

"Sergey V. Udaltsov" wrote:
> 
> > In these last few days I have been working on the Mesa software blending
> > and the existing MMX bug. I've made some progress.
> Sorry for my ignorance, does this blending have anything to do with the
> incorrect fog handling in the tunnel app? Will this patch fix it?

I don't think so.  I haven't noticed a problem with fog in the tunnel demo.

-Brian

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Sergey V. Udaltsov

> In these last few days I have been working on the Mesa software blending 
> and the existing MMX bug. I've made some progress.
Sorry for my ignorance, does this blending have anything to do with the
incorrect fog handling in the tunnel app? Will this patch fix it?

Sergey

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Mesa software blending

2002-04-01 Thread Brian Paul

"José Fonseca" wrote:
> 
> In these last few days I have been working on the Mesa software blending
> and the existing MMX bug. I've made some progress.
> 
> I made a small test program which calls the relevant functions directly as
> Alex suggested. In the process I added comments to the assembly code
> (which had none). The error is due to the fact that the inner loop blends
> two pixels at the same time, so if the mask of the first element is zero
> then both are skipped. I also spotted some errors in the runin section,
> e.g., it ANDs with 4 and compares the result with 8 which is impossible...
> I still have to study the x86 architecture optimization a little further
> to know how to optimally fix both these situations.
> 
> I also made two optimizations in blend_transparency(s_blend.c) which have
> no effect in the result precision but that achieved a global speedup of
> 30% in the function. These optimizations are in the C code and benefit all
> architectures.
> 
> The first was to avoid the repetition of the input variable in the DIV255.
> At least my version of gcc (2.96) wasn't factoring the common code out
> yelding to a 17% speedup.
> 
> The second was to factor the equation of blending reducing in half the
> number of multiplications. This optimization can be applied in other
> places on this file as well.

Good work.  I'll review your changes and probably apply it to the Mesa trunk
(for version 4.1) later today.


> A third optimization that I'll try is the "double blend" trick (make two
> 8-bit multiplications at the same time in a 32-bit register) as documented
> by Michael Herf (http://www.stereopsis.com/doubleblend.html - a quite
> interesting site referred to me by Brian).

I was going to do that someday too.  Go for it.


> I would like to keep improving Mesa software rendering performance. I know
> that due to its versatility and power Mesa will never rival with a
> dedicated and non-conformant software 3d engine such as unreal one,
> nevertheless I think that it's possible to make it usefull for simple
> realtime rendering. Regards,

Despite the proliferation of 3D hardware, there'll always be applications
for software rendering.  For example, the 16-bit color channel features is
being used by several animation houses.

-Brian

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] Mesa software blending

2002-03-31 Thread José Fonseca

In these last few days I have been working on the Mesa software blending 
and the existing MMX bug. I've made some progress.

I made a small test program which calls the relevant functions directly as 
Alex suggested. In the process I added comments to the assembly code 
(which had none). The error is due to the fact that the inner loop blends 
two pixels at the same time, so if the mask of the first element is zero 
then both are skipped. I also spotted some errors in the runin section, 
e.g., it ANDs with 4 and compares the result with 8 which is impossible... 
I still have to study the x86 architecture optimization a little further 
to know how to optimally fix both these situations.

I also made two optimizations in blend_transparency(s_blend.c) which have 
no effect in the result precision but that achieved a global speedup of 
30% in the function. These optimizations are in the C code and benefit all 
architectures.

The first was to avoid the repetition of the input variable in the DIV255. 
At least my version of gcc (2.96) wasn't factoring the common code out 
yelding to a 17% speedup.

The second was to factor the equation of blending reducing in half the 
number of multiplications. This optimization can be applied in other 
places on this file as well.

A third optimization that I'll try is the "double blend" trick (make two 
8-bit multiplications at the same time in a 32-bit register) as documented 
by Michael Herf (http://www.stereopsis.com/doubleblend.html - a quite 
interesting site referred to me by Brian).


I would like to keep improving Mesa software rendering performance. I know 
that due to its versatility and power Mesa will never rival with a 
dedicated and non-conformant software 3d engine such as unreal one, 
nevertheless I think that it's possible to make it usefull for simple 
realtime rendering. Regards,

José Fonseca


Index: swrast/s_blend.c
===
RCS file: /cvsroot/mesa3d/Mesa/src/swrast/s_blend.c,v
retrieving revision 1.14
diff -u -r1.14 s_blend.c
--- swrast/s_blend.c27 Mar 2002 15:49:27 -  1.14
+++ swrast/s_blend.c1 Apr 2002 00:34:20 -
@@ -132,12 +132,24 @@
 #if CHAN_BITS == 8
 /* This satisfies Glean and should be reasonably fast */
 /* Contributed by Nathan Hand */
+#if 0
 #define DIV255(X)  (((X) << 8) + (X) + 256) >> 16
+#else
+   const GLint temp;
+#define DIV255(X)  (temp = (X), ((temp << 8) + temp + 256) >> 16)
+#endif
+#if 0
 const GLint s = CHAN_MAX - t;
 const GLint r = DIV255(rgba[i][RCOMP] * t + dest[i][RCOMP] * s);
 const GLint g = DIV255(rgba[i][GCOMP] * t + dest[i][GCOMP] * s);
 const GLint b = DIV255(rgba[i][BCOMP] * t + dest[i][BCOMP] * s);
 const GLint a = DIV255(rgba[i][ACOMP] * t + dest[i][ACOMP] * s);
+#else
+const GLint r = DIV255((rgba[i][RCOMP] - dest[i][RCOMP]) * t) + 
+dest[i][RCOMP];
+const GLint g = DIV255((rgba[i][GCOMP] - dest[i][GCOMP]) * t) + 
+dest[i][GCOMP];
+const GLint b = DIV255((rgba[i][BCOMP] - dest[i][BCOMP]) * t) + 
+dest[i][BCOMP];
+const GLint a = DIV255((rgba[i][ACOMP] - dest[i][ACOMP]) * t) + 
+dest[i][ACOMP]; 
+#endif
 #undef DIV255
 #elif CHAN_BITS == 16
 const GLfloat tt = (GLfloat) t / CHAN_MAXF;


Index: X86/mmx_blend.S
===
RCS file: /cvsroot/mesa3d/Mesa/src/X86/mmx_blend.S,v
retrieving revision 1.5
diff -u -r1.5 mmx_blend.S
--- X86/mmx_blend.S 28 Mar 2001 20:44:44 -  1.5
+++ X86/mmx_blend.S 1 Apr 2002 00:35:13 -
@@ -7,25 +7,35 @@
 ALIGNTEXT16
 GLOBL GLNAME(_mesa_mmx_blend_transparency)
 
+/*
+ * void blend_transparency( GLcontext *ctx,
+ *  GLuint n, 
+ *  const GLubyte mask[],
+ *  GLchan rgba[][4], 
+ *  CONST GLchan dest[][4] )
+ * 
+ * Common transparency blending mode.
+ */
 GLNAME( _mesa_mmx_blend_transparency ):
 PUSH_L( EBP )
 MOV_L ( ESP, EBP )
 SUB_L ( CONST(52), ESP )
 PUSH_L( EBX )
+
 MOV_L ( CONST(16711680), REGOFF(-8, EBP) )
 MOV_L ( CONST(16711680), REGOFF(-4, EBP) )
 MOV_L ( CONST(0), REGOFF(-16, EBP) )
 MOV_L ( CONST(-1), REGOFF(-12, EBP) )
 MOV_L ( CONST(-1), REGOFF(-24, EBP) )
 MOV_L ( CONST(0), REGOFF(-20, EBP) )
-MOV_L ( REGOFF(24, EBP), EAX )
+MOV_L ( REGOFF(24, EBP), EAX ) /* rgba */
 ADD_L ( CONST(4), EAX )
 MOV_L ( EAX, EDX )
-AND_L ( REGOFF(20, EBP), EDX )
+AND_L ( REGOFF(20, EBP), EDX ) /* mask */
 MOV_L ( EDX, EAX )
 AND_L ( CONST(4), EAX )
 CMP_L ( CONST(8), EAX )
-JNE   ( LLBL(GMBT_2) )
+JNE   ( LLBL(GMBT_no_align) )
 MOV_L ( REGOFF(20, EBP), EAX )
 ADD_L ( CONST(3), EAX )
 XOR_L ( ED