Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-11 Thread Raystonn

 You know what they found out with all of the hundreds of millions of
dollars
 they spent?  Dedicated hardware still does it faster and cheaper.  Period.
 It's just like writing a custom routine to sort an array will pretty much
 always be faster than using the generic qsort.  When you hand-tune for a
 specific data set you will always get better performance.  This is not to
 say that the generic implementation will not perform well or even
acceptably
 well, but only to say that it will never, ever, ever perform better.

Here you are comparing different algorithms.  A custom sort algorithm will
perform much better than a standard qsort.  I agree.  Implementing something
in hardware does not mean it uses a more efficient algorithm however.  A
hardware implementation is just that, an implementation.  It does not change
the underlying algorithms that are being used.  In fact, it tends to set the
algorithm in stone.  This makes it very hard to adopt new better algorithms
as they are invented.  In order to move to a better algorithm you must wait
for a hardware manufacturer to implement it and then fork out more money.

Dedicated hardware can do a limited set of things faster.  There is no way
to increase its capabilities without purchasing new hardware.  This is the
weakness of having dedicated hardware for very specific functionality.  If a
better algorithm is invented, it can take an extremely long time for it to
be brought to market, if it is at all, and it will cost yet more money.
Software has the advantage of being able to implement new algorithms much
more quickly.  If a new algorithm is found to be that much better than the
old, a software implementation of this algorithm will in fact outperform a
hardware implementation of the older algorithm.  Algorithms are at least an
order of magnitude more important than the implementation itself.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-11 Thread David Bronaugh

On Thu, 11 Apr 2002 00:26:17 -0700
Raystonn [EMAIL PROTECTED] wrote:

 Here you are comparing different algorithms.  A custom sort algorithm will
 perform much better than a standard qsort.  I agree.  Implementing something
 in hardware does not mean it uses a more efficient algorithm however.  A
 hardware implementation is just that, an implementation.  It does not change
 the underlying algorithms that are being used.  In fact, it tends to set the
 algorithm in stone.  This makes it very hard to adopt new better algorithms
 as they are invented.  In order to move to a better algorithm you must wait
 for a hardware manufacturer to implement it and then fork out more money.

As far as I know, every new graphics chip out there right now is programmable - it may 
have a limited number of operands but the microcode is certainly modifiable. They 
aren't just straight ASICs.

David Bronaugh

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-11 Thread Raystonn

  Here you are comparing different algorithms.  A custom sort algorithm
will
  perform much better than a standard qsort.  I agree.  Implementing
something
  in hardware does not mean it uses a more efficient algorithm however.  A
  hardware implementation is just that, an implementation.  It does not
change
  the underlying algorithms that are being used.  In fact, it tends to set
the
  algorithm in stone.  This makes it very hard to adopt new better
algorithms
  as they are invented.  In order to move to a better algorithm you must
wait
  for a hardware manufacturer to implement it and then fork out more
money.

 As far as I know, every new graphics chip out there right now is
programmable - it may have a limited number of operands but the microcode is
certainly modifiable. They aren't just straight ASICs.

The chips may (or may not, I have not double checked) be somewhat
programmable, but the arrangement of the chips in the pipeline are not.
Thus, the implementation of whatever algorithm they use can be tweaked
somewhat, but the algorithm is pretty much hard-coded.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-11 Thread Nicholas Charles Leippe

  You know what they found out with all of the hundreds of millions of
 dollars
  they spent?  Dedicated hardware still does it faster and cheaper.  Period.
  It's just like writing a custom routine to sort an array will pretty much
  always be faster than using the generic qsort.  When you hand-tune for a
  specific data set you will always get better performance.  This is not to
  say that the generic implementation will not perform well or even
 acceptably
  well, but only to say that it will never, ever, ever perform better.
 
 Here you are comparing different algorithms.  A custom sort algorithm will
 perform much better than a standard qsort.  I agree.  Implementing something
 in hardware does not mean it uses a more efficient algorithm however.  A
 hardware implementation is just that, an implementation.  It does not change
 the underlying algorithms that are being used.  In fact, it tends to set the
 algorithm in stone.  This makes it very hard to adopt new better algorithms
 as they are invented.  In order to move to a better algorithm you must wait
 for a hardware manufacturer to implement it and then fork out more money.
 
 Dedicated hardware can do a limited set of things faster.  There is no way
 to increase its capabilities without purchasing new hardware.  This is the
 weakness of having dedicated hardware for very specific functionality.  If a
 better algorithm is invented, it can take an extremely long time for it to
 be brought to market, if it is at all, and it will cost yet more money.
 Software has the advantage of being able to implement new algorithms much
 more quickly.  If a new algorithm is found to be that much better than the
 old, a software implementation of this algorithm will in fact outperform a
 hardware implementation of the older algorithm.  Algorithms are at least an
 order of magnitude more important than the implementation itself.
 
 -Raystonn

Yes.  Choosing the correct (best) algorithm for a given problem will
reduce the calculation cost with the most significance.  Yes, once
a piece of silicon is etched, it is 'in stone' as to the featureset
it provides, and yes, if you want the latest and greatest featureset
in silicon you'll always have to fork out more money.  That's how
it's always been, and will always be.

However,  none of the commodity general purpose cpus are designed for
highly parallel execution of parallelizable algorithms--which just
about every graphics operation is.  How many pixels can a 2GHz Athlon
process at a time?  Usually just one.  How many can dedicated silicon?  
Mostly limited by how many can be fetched from memory at a time.
Thus, the algorithm is _not_ always an order of magnitude more
important than the implementation itself--especially if a parallelized
implementation can provide orders of magnitude more performance than
a serial implementation of the same or an even superior algorithm.

It remains a fact that in many cases where graphics algorithms are
concerned, even less efficient algorithms implemented in a highly
parallel fashion in specialized silicon (even _old_ silicon--voodoo2)
can still significantly outperform the snazziest new algorithm
implemented serially in software on even a screaming fast general
purpose cpu.  (see the links in the thread to the comparisons of
hardware w/a voodoo2 vs software w/an athlon 1+ GHz)


Nick


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-10 Thread Ian Romanick

On Tue, Apr 09, 2002 at 06:29:54PM -0700, Raystonn wrote:

 First off, current market leaders began their hardware designs back when the
 main CPU was much much slower.  They have an investment in this technology
 and likely do not want to throw it away.  Back when these companies were
 founded, such 3d rendering could not be performed on the main processor at
 all.  The computational power of the main processor has since increased
 dramatically.  The algorithmic approach to 3d rendering should be reexamined
 with current and future hardware in mind.  What was once true may no longer
 be so.
 
 Second, if a processor intensive algorithm was capable of better efficiency
 than a bandwidth intensive algorithm, there is a good chance these
 algorithms would be movd back over to the main CPU.  If the main processor
 took over 3D rendering, what would the 3D card manufacturers sell?  It would
 put them out of business essentially.  Therefore you cannot gauge what is
 the most efficient algorithm based on what the 3D card manufacturers decide
 to push.  They will push whatever is better for their bottom line and their
 own future.

I'm getting very tired of this thread.  If modern CPUs are s much better
for 3D, then why does Intel, of all companies, still make its own 3D
hardware in addition to CPUs?!?  If the main CPU was so wonderful for 3D
rendering, Intel would be all over it.  In fact, they tried to push that
agenda once when MMX first became available.  Remember?  Had it come out
before the original Voodoo Graphics, things might have been different for a
time.

You know what they found out with all of the hundreds of millions of dollars
they spent?  Dedicated hardware still does it faster and cheaper.  Period.
It's just like writing a custom routine to sort an array will pretty much
always be faster than using the generic qsort.  When you hand-tune for a
specific data set you will always get better performance.  This is not to
say that the generic implementation will not perform well or even acceptably
well, but only to say that it will never, ever, ever perform better.

-- 
Tell that to the Marines!

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Raystonn

  I still maintain that immediate mode renderering is an inefficient
algorithm
  designed to favor the use of memory over computations.  A better
algorithm
  will always win out given enough time to overtake the optimized versions
of
  the more inefficient algorithms.

 Perhaps you've forgotten what you originally said? The kyro is a graphics
card.

 But still, hand-waving v real-world pragmatic performance figures matter
 more, and here your Kyro and P4 lose.

 It really doesn't matter if algo (a) is better than (b). To progress
 your argument you need to prove[1] that algo (a) is at least as good as,
 and as cheap, in software on the P4 than either some other algo or the
 same one in a graphics card. Whilst still allowing that processor to
 perform other functions.

 [1] with numbers not with rhetoric.

The first paragraph (the one you chose to quote) has nothing to do with
implementing it in software.  That was an entirely different discussion.
This discussion is currently about the new topic of whether or not
scene-capture tile-based rendering is more efficient than immediate mode
rendering.  I maintain that it is, and have included my arguments in my last
post.

If you want to get back to the topic of software rendering, I would be more
than happy to oblige.  But please don't quote arguments for a point in one
debate and show them to be inadequate for proving a point in a prior debate.
The top paragraph was not intended to support any argument regarding
software rendering.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Alexander Stohr

 I agree with the you have to read pixels back from the frame 
 buffer and
 then continue rendering polygons.  For a hardware 
 implementation I might
 agree with the you need to draw more polygons than your 
 hardware has room
 to store, but only if the hardware implementation decides to perform
 overdraw rather than fetching the triangles on the fly from 
 AGP memory.

You need to agree that current hardware does implement the 
sheme where some percentage of pixels is drawn multiple times.
Its a straigforward hardware design that nicely opens ways 
for getting the performance with an affordable amount of
ASIC design engineering power. I dont assume the current
market leaders did choose that way if they did expect to
get more performance from the approaches. In the end i am
pretty sure that this approach does provide more ways for
interesing features and effects than the mentioned one pass 
rendering would provide.

Anyways, the current memory interfaces for the framebuffer memory
arent the performance break at all today. Its the features that
the applications do demand e.g. n-times texturing.

If these one-pass algorithms would be so resource saving, 
why is there only a single hardware implementation and 
the respective software solutions are of not much attention?
The only reason i can see is, that it does not work as 
effective and performance increasing. To be honest you must
substract the preprocessing time from the rendering gain.
And you must expect the adapters not rendering at full speed 
because its running idle for some time due to CPU reasons.

 With the rest I disagree.  The Kyro, for example, has some 
 high-speed local
 memory (cache) it uses to hold the pixels for a tile.  It can 
 antialias and
 render translucent scenes without ever blitting the cache to 
 the framebuffer
 more than once.  This is the advantage to tile-based 
 rendering.  Since you
 only need to hold a tile's worth of pixels, you can use 
 smaller high-speed
 cache.

Pixel caches and tiled framebuffers/textures are state of the art
for most (if not all) of current engines. Only looking at the Kyro
would draw a fals view of the market. Kryo has it too, so its sort
of a me too product. But a vendors marketing department will never 
tell you that it is this way.

 As far as the reading of pixels from the framebuffer, this is a highly
 inefficient thing to do, no matter the hardware.  If you want a fast
 application you will not attempt to read from the video card's memory.
 These operations are always extremely slow.

For this there are caches (most often generic for nearly any render unit).
And reading is not that different from writing on current RAM designs. 
Some reading is always working without any noticeable impact on performance,
(and its done for a good bunch of applications and features)
but if you need much data from framebuffer, than you might notice it.
That closer the pixel consuming circuit is to the RAM that better it
will work. A CPU is one of the not so good consumers for pixels.

 I still maintain that immediate mode renderering is an 
 inefficient algorithm designed to favor the use of memory 
 over computations.

Hmm, current state of the art is called display list based rendering
and its up to date and nicely optimizde despite the concept is
an older one. It takes the goods of both worlds. Fast overdrawing
rendering into memory and a higher level of primitive preprocessing.
With only a single comparision on a preprocessed displaylist you
can quickly decide if that display list is in need to be sent to the
grafics adapter. 

Just belive that the performance is only at an optimum level if
you are able to take the best of the two worlds - extreme overdraw
rendering is neither good for performance, nor is intense geometrical 
preprocessing on a per frame base a viable way to performance.
The hardware industry has found nice ways for combining both of
these technologies to provide you the best of both worlds and thus
the highes performance. And they are further developing in both of
that areas and a few others more.

Regards, Alex.


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Gareth Hughes

Stephen J Baker wrote:
 
Everything starts out in hardware and eventually moves to software.

That's odd - I see the reverse happening.  First we had software

The move from hardware to software is an industry-wide pattern for all
technology.  It saves money.  3D video cards have been implementing new
technologies that were never used in software before.  Once the main
processor is able to handle these things, they will be moved into software.
This is just a fact of life in the computing industry.  Take a look at what
they did with Winmodems.  They removed hardware and wrote drivers to
perform the tasks.  The same thing will eventually happen in the 3D card
industry.
 
 
 That's not quite a fair comparison.

I agree.  You may want to take a look at the following article:

http://www.tech-report.com/reviews/2001q2/tnl/index.x?pg=1

It shows, among other things, a 400MHz PII with a 3dfx Voodoo2 (hardware 
rasterization) getting almost double the framerate of a 1.4GHz Athlon 
doing software rendering with Quake2 -- and the software rendering is 
not even close to the quality of the hardware rendering due to all the 
shortcuts being taken.

What we are seeing, throughout the industry, is a move to programmable 
graphics engines rather than fixed-function ones.  Programmable vertex 
and fragment pipelines are not the same as a software implementation on 
a general purpose CPU, as the underlying hardware still has the special 
functionality needed for 3D graphics.  I suspect that this will continue 
to be true for a very, very long time.

-- Gareth


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Raystonn

  I agree with the you have to read pixels back from the frame
  buffer and
  then continue rendering polygons.  For a hardware
  implementation I might
  agree with the you need to draw more polygons than your
  hardware has room
  to store, but only if the hardware implementation decides to perform
  overdraw rather than fetching the triangles on the fly from
  AGP memory.

 You need to agree that current hardware does implement the
 sheme where some percentage of pixels is drawn multiple times.
 Its a straigforward hardware design that nicely opens ways
 for getting the performance with an affordable amount of
 ASIC design engineering power. I dont assume the current
 market leaders did choose that way if they did expect to
 get more performance from the approaches. In the end i am
 pretty sure that this approach does provide more ways for
 interesing features and effects than the mentioned one pass
 rendering would provide.

First off, current market leaders began their hardware designs back when the
main CPU was much much slower.  They have an investment in this technology
and likely do not want to throw it away.  Back when these companies were
founded, such 3d rendering could not be performed on the main processor at
all.  The computational power of the main processor has since increased
dramatically.  The algorithmic approach to 3d rendering should be reexamined
with current and future hardware in mind.  What was once true may no longer
be so.

Second, if a processor intensive algorithm was capable of better efficiency
than a bandwidth intensive algorithm, there is a good chance these
algorithms would be movd back over to the main CPU.  If the main processor
took over 3D rendering, what would the 3D card manufacturers sell?  It would
put them out of business essentially.  Therefore you cannot gauge what is
the most efficient algorithm based on what the 3D card manufacturers decide
to push.  They will push whatever is better for their bottom line and their
own future.


 Anyways, the current memory interfaces for the framebuffer memory
 arent the performance break at all today. Its the features that
 the applications do demand e.g. n-times texturing.

The features of most games today do cause the current memory interfaces to
be the performance bottleneck.  This is why overclocking your card's memory
offers more of a performance gain than overclocking your card's processor.


 If these one-pass algorithms would be so resource saving,
 why is there only a single hardware implementation and
 the respective software solutions are of not much attention?

Why should a 3D card hardware company show interest in something that could
so easily be implemented in software?  How does that benefit their bottom
lines?


  With the rest I disagree.  The Kyro, for example, has some
  high-speed local
  memory (cache) it uses to hold the pixels for a tile.  It can
  antialias and
  render translucent scenes without ever blitting the cache to
  the framebuffer
  more than once.  This is the advantage to tile-based
  rendering.  Since you
  only need to hold a tile's worth of pixels, you can use
  smaller high-speed
  cache.

 Pixel caches and tiled framebuffers/textures are state of the art
 for most (if not all) of current engines. Only looking at the Kyro
 would draw a fals view of the market. Kryo has it too, so its sort
 of a me too product. But a vendors marketing department will never
 tell you that it is this way.

No, tile buffers cannot be used by immediate mode renderers to eliminate
overdraw.  Immediate mode does not render on a per-pixel basis.  It renders
on a per-polygon basis.  Current hardware engines that use immediate mode
rendering in fact do not make use of tile-based rendering.  They would need
a tile buffer the size of the entire framebuffer.  At that point it is no
longer a high speed buffer.  It is simply the framebuffer.  Imagine the cost
of high-speed cache in quantities large enough to hold a full frame buffer,
especially at high resolutions...

While I would prefer to see a software implementation of scene-capture
tile-based rendering, the Kyro was a good first step.  It was the first
mainstream card to use these algorithms.  For this I applaud them.  This was
by no means a me too product as you claimed.


  As far as the reading of pixels from the framebuffer, this is a highly
  inefficient thing to do, no matter the hardware.  If you want a fast
  application you will not attempt to read from the video card's memory.
  These operations are always extremely slow.

 For this there are caches (most often generic for nearly any render unit).
 And reading is not that different from writing on current RAM designs.
 Some reading is always working without any noticeable impact on
performance,
 (and its done for a good bunch of applications and features)
 but if you need much data from framebuffer, than you might notice it.
 That closer the pixel consuming circuit is to the RAM that better it
 

Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Raystonn

  With the rest I disagree.  The Kyro, for example, has some high-speed
local
  memory (cache) it uses to hold the pixels for a tile.  It can antialias
and
  render translucent scenes without ever blitting the cache to the
framebuffer
  more than once.

 It can't have infinite storage for tile information - so there would have
to
 be a hard limit on the number of translucent/edge/intersecting tiles -
that
 would not be OpenGL compliant.

Each tile has a list of all polygons that might be drawn on its pixel cache.
For a hardware implementation, memory could become an issue.  If memory gets
tight, the implementation could render all polygons currently in its tile
lists and clear out tile memory.  This would trade off memory space for a
bit of overdraw.  For a software implementation this would really not be a
problem.


  This is the advantage to tile-based rendering.  Since you
  only need to hold a tile's worth of pixels, you can use smaller
high-speed
  cache.

 Only if the number of visible layers is small enough.

What does the number of visible layers have to do with the ability to break
down processing to a per-tile basis?  I am not following here.


  As far as the reading of pixels from the framebuffer, this is a highly
  inefficient thing to do, no matter the hardware.  If you want a fast
  application you will not attempt to read from the video card's memory.
  These operations are always extremely slow.

 They are only slow if the card doesn't implement them well - but there
 are plenty of techniques (eg impostering) that rely on this kind of thing.

This will always be slow.  AGP transfers are inherently slow compared to
everything else.  DMA transfers are usually used for writing to the video
card, usually passing it vertices and textures.  These transfers can be done
in the background.  How many video cards implement DMA transfers for reads
from the video card?  Even if this was done, most of the time you are
waiting for the results of the read in order to perform another operation.
What good is pushing something into the background when you must wait for
that operation to complete?  These slow transfers and all the waiting makes
this an extremely slow process.  It is not recommended.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Raystonn

 I agree.  You may want to take a look at the following article:

 http://www.tech-report.com/reviews/2001q2/tnl/index.x?pg=1

 It shows, among other things, a 400MHz PII with a 3dfx Voodoo2 (hardware
 rasterization) getting almost double the framerate of a 1.4GHz Athlon
 doing software rendering with Quake2 -- and the software rendering is
 not even close to the quality of the hardware rendering due to all the
 shortcuts being taken.

A software implementation of an immediate mode renderer would indeed be
extremely slow.  The main CPU does not yet have access to the kinds of
memory bandwidth that a 3D card does.  I believe a software implementation
of a scene-capture tile-renderer would have much better results.  This is a
more computationally expensive, less bandwidth-intensive algorithm which is
more suited to a CPU's environment.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Damien Miller

On Tue, 9 Apr 2002, Raystonn wrote:

 If you want to get back to the topic of software rendering, I would be more
 than happy to oblige.

Better yet, if you are serious - how about furthering your argument with
patches to optimise and improve Mesa's software paths?



___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-09 Thread Raystonn

  If you want to get back to the topic of software rendering, I would be
more
  than happy to oblige.

 Better yet, if you are serious - how about furthering your argument with
 patches to optimise and improve Mesa's software paths?

Patches will not do the job.  My ideas include a change in algorithm, not
implementation.  This would involve a huge redesign.  I have SGI's SI and
have been mucking about with it attempting to bring some kind of order to
it.  Once I complete that I will likely start over from scratch and
implement a new design based on scene-capturing and some algorithms I have
created.  Of course I will ensure it passes conformance tests in the end.
What good is an OpenGL implementation if it does not work as advertised. ;)

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-08 Thread Stephen J Baker

On Thu, 4 Apr 2002, Raystonn wrote:

 The games perform overdraw, sure.  But I am talking about at the pixel
 level.  A scene-capture algorithm performs 0 overdraw, regardless of what
 the game sends it.

That's not true.  I've designed and built machines like this and I know.

You still need overdraw when:

  * you are antialiasing a polygon edge.
  * you are rendering translucent surfaces.
  * you need more textures on a polygon than your
hardware can render in a single pass.
  * you have to read pixels back from the frame buffer and then
continue rendering polygons.
  * polygons get smaller than a pixel in width or height.
  * you need to draw more polygons than your hardware has
room to store.

...I'm sure there are other reasons too.

  This reduces fillrate needs greatly.

It reduces it (in my experience) by a factor of between 2 and 4 depending
on the nature of the scene.  You can easily invent scenes that show much
more benefit - but they tend to be contrived cases that don't crop up
much in real applications because of things like portal culling.

  Also, in order to use scene capture, you are reliant on the underlying
  graphics API to be supportive of this technique.  Neither OpenGL nor
  Direct3D are terribly helpful.

 Kyro-based 'scene-capture' video cards support both Direct3D and OpenGL.

They do - but they perform extremely poorly for OpenGL programs that
do anything much more complicated than just throwing a pile of polygons
at the display.  As soon as you get into reading back pixels for any
reason, any scene-capture system has to render the polygons it has
before the program can access the pixels in the frame buffer.

   Everything starts out in hardware and eventually moves to software.
 
  That's odd - I see the reverse happening.  First we had software

 The move from hardware to software is an industry-wide pattern for all
 technology.  It saves money.  3D video cards have been implementing new
 technologies that were never used in software before.  Once the main
 processor is able to handle these things, they will be moved into software.
 This is just a fact of life in the computing industry.  Take a look at what
 they did with Winmodems.  They removed hardware and wrote drivers to
 perform the tasks.  The same thing will eventually happen in the 3D card
 industry.

That's not quite a fair comparison.

Modems can be moved into software because there is no need for them *EVER*
to get any faster.  All modern modems can operate faster than any standard
telephone line and are in essence *perfect* devices that cannot be improved
upon in any way.  Hence a hardware modem that would run MUCH faster than the
CPU would be easy to build - but we don't because it's just not useful.
That artificial limit on the speed of a modem is the only thing that allows
software to catch up with hardware and make it obsolete.

We might expect sound cards to go the same way - once they get fast enough
to produce any concievable audio experience that the human perceptual
system can comprehend - then there is a chance for software audio to catch
up.  That hasn't happened yet - which is something I find rather suprising.

But that's in no way analogous to the graphics situation where we'll continue
to need more performance until the graphics you can draw are completely
photo-realistic - indistinguishable from the real world - and operate over
the complete visual field at eye-limiting resolution.  We are (in my
estimation) still at least three orders of magnitude in performance
away from that pixel fill rate and far from where we need to be in
terms of realism and polygon rates.


Steve Baker  (817)619-2657 (Vox/Vox-Mail)
L3Com/Link Simulation  Training (817)619-2466 (Fax)
Work: [EMAIL PROTECTED]   http://www.link.com
Home: [EMAIL PROTECTED]   http://www.sjbaker.org


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-08 Thread Raystonn

  The games perform overdraw, sure.  But I am talking about at the pixel
  level.  A scene-capture algorithm performs 0 overdraw, regardless of
what
  the game sends it.

 That's not true.  I've designed and built machines like this and I know.

 You still need overdraw when:

   * you are antialiasing a polygon edge.
   * you are rendering translucent surfaces.
   * you need more textures on a polygon than your
 hardware can render in a single pass.
   * you have to read pixels back from the frame buffer and then
 continue rendering polygons.
   * polygons get smaller than a pixel in width or height.
   * you need to draw more polygons than your hardware has
 room to store.

I agree with the you have to read pixels back from the frame buffer and
then continue rendering polygons.  For a hardware implementation I might
agree with the you need to draw more polygons than your hardware has room
to store, but only if the hardware implementation decides to perform
overdraw rather than fetching the triangles on the fly from AGP memory.

With the rest I disagree.  The Kyro, for example, has some high-speed local
memory (cache) it uses to hold the pixels for a tile.  It can antialias and
render translucent scenes without ever blitting the cache to the framebuffer
more than once.  This is the advantage to tile-based rendering.  Since you
only need to hold a tile's worth of pixels, you can use smaller high-speed
cache.

As far as the reading of pixels from the framebuffer, this is a highly
inefficient thing to do, no matter the hardware.  If you want a fast
application you will not attempt to read from the video card's memory.
These operations are always extremely slow.

I still maintain that immediate mode renderering is an inefficient algorithm
designed to favor the use of memory over computations.  A better algorithm
will always win out given enough time to overtake the optimized versions of
the more inefficient algorithms.

-Raystonn


Sponsored by http://www.ThinkGeek.com/

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-08 Thread Allen Akin

On Mon, Apr 08, 2002 at 06:17:59PM -0700, Raystonn wrote:
| As far as the reading of pixels from the framebuffer, this is a highly
| inefficient thing to do, no matter the hardware.

It doesn't have to be; that's just a tradeoff made by the hardware
designers depending on the applications for which their systems are
intended.

Reading previously-rendered pixels is useful for things like
dynamically-constructed environment maps, shadow maps, correction for
projector optics, film compositing, and parallel renderers.  There are
various ways hardware can assist these operations, and various ways
tiled renderers interact with them, but that discussion is too lengthy
for this note.  At any rate, the ability to use the results of previous
renderings is a pretty important capability.

| I still maintain that immediate mode renderering is an inefficient algorithm
| designed to favor the use of memory over computations.

An important design characteristic of immediate mode is that it allows
the application to determine the rendering order.  This helps achieve
certain rendering effects (such as those Steve described earlier), but
it can also be a *huge* efficiency win if the scene involves expensive
mode changes, such as texture loads/unloads.  Check out the original
Reyes paper for a good quantitative discussion of this sort of issue.

Allen

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-05 Thread Ian Romanick

On Thu, Apr 04, 2002 at 09:30:39PM -0800, Raystonn wrote:

  OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed.
 
 No, in this 5 year period, processor clockspeed has moved from approximately
 200MHz to over 2GHz.  This is a factor of 10 in CPU growth and 16 in memory
 bandwidth.  Memory bandwidth is growing more quickly than processor
 clockspeed now.

Uhm...you've fallen into Intel's clockspeed trap.  I'm assuming that you're
talking about a 200MHz Pentium vs. a 2GHz Pentium4.  In the best case, the
Pentium could issue two instructions at once where as a Pentium4 or Athlon
can issue (and retire) many, many more.  Not only that, the cycle times of
many instructions (such as multiply and divide) has decreased.  I 2GHz
Pentium would still be crushed by a 2GHz Pentium4.  Just like a 200Mhz
Pentium would crush a 200Mhz 286. :)

-- 
Tell that to the Marines!

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-05 Thread David Bronaugh

On Fri, 5 Apr 2002 17:11:26 -0800
Raystonn [EMAIL PROTECTED] wrote:
 Yes, some details were left out of CPU performance increases.  The same was
 done for memory performance increases though.  We have been discussing
 memory bandwidth as memory performance, completely leaving out memory
 latency, which has also improved tremendously.

Pardon me, but I haven't seen this wonderful improvement.

I benchmarked several machines a while back:
- a P200 with a TX chipset board and EDO DRAM
- a P2-266 with an LX chipset and PC-66 SDRAM
- a K6-III/550 with a Via MVP3 chipset and PC-100 SDRAM

The P200 pulled off about 75MBytes/sec; the P2-266 pulled off about 55 MBytes/sec; the 
K6-III/550 pulled off about 100MBytes/sec. All of this was done under Linux; tests 
were performed with memtest86 (? it's been a while, basically though they were not 
performed under any operating system other than that which was on the floppy).

This doesn't support your conclusions here. I would hazard a guess that memory 
performance there had more to do with the chipset involved than superior memory 
technology.

David Bronaugh

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-04 Thread Stephen J Baker

On Tue, 2 Apr 2002, Raystonn wrote:

  That is far from the truth - they have internal pipelining
  and parallelism.  Their use of silicon can be optimised to balance
  the performance of just one single algorithm.  You can never do that
  for a machine that also has to run an OS, word process and run
  spreadsheets.

 Modern processors have internal pipelining and parallelism as well.

Yes - and yet they still have horrible problems every time you have
a conditional branch instruction.  That's because they are trying
to convert a highly linear operation (code execution) into some
kind of a parallel form.  Graphics is easier though.  Each pixel and
each polygon can be treated as a stand-alone entity and can be
processed in true parallelism.

  Most of
 the processing power of today's CPUs go completely unused.  It is possible
 to create optimized implementations using Single-Instruction-Multiple-Data
 (SIMD) instructions of efficient algorithms.

Which is a way of saying Yes, you could do fast graphics on the CPU
if you put the GPU circuitry onto the CPU chip and pretend that it's
now part of the core CPU.

I'll grant you *that* - but it's not the same thing as doing the
graphics in software.

  Since 1989, CPU speed has grown by a factor of 70.  Over the same
  period the memory bus has increased by a factor of maybe 6 or so.

 We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM)
 to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years.  We have
 over 16 times the memory bandwidth available today than we did just 5 years
 ago.  Available memory bandwidth has been growing more quickly than
 processor clockspeed lately, and I do not foresee an end to this any time
 soon.

OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed.
My argument remains - and remember that whenever RAM gets faster,
so do the graphics cards.  You can run faster - but you can't catch up
if the other guy is also running faster.

  On the other hand, the graphics card can use heavily pipelined
  operations to guarantee that the memory bandwidth is 100% utilised

 Overutilised in my opinion.  The amount of overdraw performed by today's
 video cards on modern games and applications is incredible.  Immediate mode
 rendering is an inefficient algorithm.  Video cards tend to have extremely
 well optimized implementations of this inefficient algorithm.

That's because games *NEED* to do lots of overdraw.  They are actually
pretty smart about eliminating the 'obvious' cases by doing things
like portal culling.  Most of the overdraw comes from needing to do
multipass rendering (IIRC, the new Return To Castle Wolfenstien game
uses up to 12 passes to render some polygons).  The overdraw due to
that kind of thing is rather harder to eliminate with algorithmic
sophistication.  If you need that kind of surface quality, your
bandwidth out of memory will be high no matter what.

 Kyro-based video cards perform quite well.  They are not quite up to the
 level of nVidia's latest cards...

Not *quite*!!! Their best card is significantly slower than
a GeForce 2MX - that's four generations of nVidia technology
ago.

I agree that if this algorithm were to be implemented on a card
with the *other* capabilities of an nVidia card - then it would improve
the fill rate by perhaps a factor of two or four. (Before you argue
about that - realise that I've designed *and* built hardware and software
using this technology - and I've MEASURED it's performance for 'typical'
scenes).

But you can only draw scenes where the number of polygons being rendered
can fit into the 'scene capture' buffer.  And that's the problem with
that technology.

If I want to draw a scene with a couple of million polygons in it (perfectly
possible with modern cards) then those couple of million polygons have
to be STORED ON THE GRAPHICS CARD.  That's a big problem for an affordable
graphics card.

Adding another 128Mb of fast RAM to store the scene in costs a lot more
than doubling the amount of processing power on the GPU.  The amount of
RAM on the chip becomes a major cost driver for a $120 card.

None of those issues affect a software solution though - and it's
possible that a scene capture solution *could* be better than a
conventional immediate mode renderer - but I still think that
it will at MOST only buy you a factor of 2x or 4x pixel rate speedup
and you have a MUCH larger gap than that to hurdle.

Also, in order to use scene capture, you are reliant on the underlying
graphics API to be supportive of this technique.  Neither OpenGL nor
Direct3D are terribly helpful.  You can write things like:

   Render 100 polygons.
   Read back the image they created.

   if the pixel at (123,456) is purple then
   {
 put that image into texture memory.

 Render another 100 polygons using the texture
 you just created.
   }

...scene capture algorithms have a very hard time with things like
that because you can only read back the image 

Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-04 Thread Raystonn

 Yes - and yet they still have horrible problems every time you have
 a conditional branch instruction.  That's because they are trying

Not really.  The Pentium 4 has a very efficient branch prediction unit.
Most of the time it guesses the correct branch to take.  When the actual
branch is computed, it stores this information for later.  Next time that
branch is encountered it analyzes the stored information and bases its
decision on that.  Conditional branches are much less of a problem now.


   Most of
  the processing power of today's CPUs go completely unused.  It is
possible
  to create optimized implementations using
Single-Instruction-Multiple-Data
  (SIMD) instructions of efficient algorithms.

 Which is a way of saying Yes, you could do fast graphics on the CPU
 if you put the GPU circuitry onto the CPU chip and pretend that it's
 now part of the core CPU.

What does this have to do with adding GPU hardware to the CPU?  These SIMD
instructions are already present on modern processors in the form of SSE and
SSE2.


  We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO
RAM)
  to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years.  We
have
  over 16 times the memory bandwidth available today than we did just 5
years
  ago.  Available memory bandwidth has been growing more quickly than
  processor clockspeed lately, and I do not foresee an end to this any
time
  soon.

 OK - so a factor 70 in CPU growth and a factor of 16 in RAM speed.

No, in this 5 year period, processor clockspeed has moved from approximately
200MHz to over 2GHz.  This is a factor of 10 in CPU growth and 16 in memory
bandwidth.  Memory bandwidth is growing more quickly than processor
clockspeed now.


  Overutilised in my opinion.  The amount of overdraw performed by today's
  video cards on modern games and applications is incredible.  Immediate
mode
  rendering is an inefficient algorithm.  Video cards tend to have
extremely
  well optimized implementations of this inefficient algorithm.

 That's because games *NEED* to do lots of overdraw.  They are actually

The games perform overdraw, sure.  But I am talking about at the pixel
level.  A scene-capture algorithm performs 0 overdraw, regardless of what
the game sends it.  This reduces fillrate needs greatly.


  Kyro-based video cards perform quite well.  They are not quite up to the
  level of nVidia's latest cards...

 Not *quite*!!! Their best card is significantly slower than
 a GeForce 2MX - that's four generations of nVidia technology
 ago.

This has nothing to do with the algorithms itself.  It merely has to do with
the company's ability to scale its hardware.  A software implementation
would not be limited in this manner.  It could take advantage of the
processor manufacturer's ability to scale speeds much more easily.


 Also, in order to use scene capture, you are reliant on the underlying
 graphics API to be supportive of this technique.  Neither OpenGL nor
 Direct3D are terribly helpful.

Kyro-based 'scene-capture' video cards support both Direct3D and OpenGL.
Any game you can play using an nVidia card you can also play using a
Kyro-based card.


   Everything that is speeding up the main CPU is also speeding up
   the graphics processor - faster silicon, faster busses and faster
   RAM all help the graphics just as much as they help the CPU.
 
  Everything starts out in hardware and eventually moves to software.

 That's odd - I see the reverse happening.  First we had software

The move from hardware to software is an industry-wide pattern for all
technology.  It saves money.  3D video cards have been implementing new
technologies that were never used in software before.  Once the main
processor is able to handle these things, they will be moved into software.
This is just a fact of life in the computing industry.  Take a look at what
they did with Winmodems.  They removed hardware and wrote drivers to
perform the tasks.  The same thing will eventually happen in the 3D card
industry.


 As CPU's get faster, graphics cards get *MUCH* faster.

This has mostly to do with memory bandwidth.  The processors on the video
cards are not all that impressive by themselves.  Memory bandwidth available
to the CPU is increasing rapidly.


 CPU's aren't catching up - they are getting left behind.

I disagree.  What the CPU lacks in hardware units it makes up with sheer
clockspeed.  A video card may be able to perform 10 times as many operations
per clock cycle as a CPU.  But if that CPU is operating at over 10 times the
clockspeed, who cares?  It will eventually be faster.  Video card
manufactures cannot scale clockspeed anywhere near as well as Intel.


 They are adding in steps to the graphics processing that are programmable.

And this introduces the same problems that the main CPU is much better at
dealing with.  Branch prediction and other software issues have been highly
optimized in the main processor.  Video cards cannot deal with software

Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-03 Thread Raystonn

   The only thing video
  cards have today that is really better than the main processor is
massive
  amounts of memory bandwidth.

 That is far from the truth - they have internal pipelining
 and parallelism.  Their use of silicon can be optimised to balance
 the performance of just one single algorithm.  You can never do that
 for a machine that also has to run an OS, word process and run
 spreadsheets.

Modern processors have internal pipelining and parallelism as well.  Most of
the processing power of today's CPUs go completely unused.  It is possible
to create optimized implementations using Single-Instruction-Multiple-Data
(SIMD) instructions of efficient algorithms.


   Since memory bandwidth is increasing rapidly,...

 It is?!?  Let's look at the facts:

 Since 1989, CPU speed has grown by a factor of 70.  Over the same
 period the memory bus has increased by a factor of maybe 6 or so.

We have gone from approximately 200MB/s of memory bandwidth (PC66 EDO RAM)
to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 years.  We have
over 16 times the memory bandwidth available today than we did just 5 years
ago.  Available memory bandwidth has been growing more quickly than
processor clockspeed lately, and I do not foresee an end to this any time
soon.


 On the other hand, the graphics card can use heavily pipelined
 operations to guarantee that the memory bandwidth is 100% utilised

Overutilised in my opinion.  The amount of overdraw performed by today's
video cards on modern games and applications is incredible.  Immediate mode
rendering is an inefficient algorithm.  Video cards tend to have extremely
well optimized implementations of this inefficient algorithm.


 - and can use an arbitarily large amount of parallelism to improve
 throughput.  The main CPU can't do that because it's memory access
 patterns are not regular and it has little idea where the next byte
 has to be read from until it's too late.

Modern processors have a considerable amount of parallelism built in.  With
advanced prefetch and streaming SIMD instructions it is very possible to do
these types of operations in a modern processor.  It will, however, take
another couple of years to be able to render at great framerates and high
resolutions.


 You only have to look at the gap you are trying to bridge - a
 modern graphics card is *easily* 100 times faster at rendering
 sophisticated pixels (with pixel shaders, multiple textures and
 antialiasing) than the CPU.

They are limited in what they can do.  In order to allow more flexibility
they have recently introduced pixel shaders, which basically turns the video
card into a mini-CPU.  Modern processors can perform these features more
quickly and would allow an order of magnitude more flexibility in what can
be done.


  A properly
  implemented and optimized software version of a tile-based
scene-capture
  renderer much like that used in Kyro could perform as well as the latest
  video cards in a year or two.  This is what I am dabbling with at the
  moment.

 I await this with interest - but 'scene capture' systems tend to be
 unusable with modern graphics API's...they can't run either OpenGL
 or Direct3D efficiently for arbitary input.  If there were to be
 some change in consumer needs that would result in 'scene capture'
 being a usable technique - then the graphics cards can easily take
 that on board and will *STILL* beat the heck out of doing it in
 the CPU.  Scene capture is also only feasible if the number of
 polygons being rendered is small and bounded - the trends are
 for modern graphics software to generate VAST numbers of polygons
 on-the-fly precisely so they don't have to be stored in slow old
 memory.

Kyro-based video cards perform quite well.  They are not quite up to the
level of nVidia's latest cards but this is new technology and is being
worked on by a relatively new company.  These cards do not require nearly as
much memory bandwidth as immediate-mode renderers, performing 0 overdraw.
They are more processing intensive rather than being bandwidth intensive.  I
see this as a more efficient algorithm.


 Everything that is speeding up the main CPU is also speeding up
 the graphics processor - faster silicon, faster busses and faster
 RAM all help the graphics just as much as they help the CPU.

Everything starts out in hardware and eventually moves to software.  There
will come a time when the basic functionality provided by video cards can be
easily done by a main processor.  The extra features offered by the video
cards, such as pixel shaders, are simply attempts to stand-in as a main
processor.  Once the basic functionality of the video card can be performed
by the main system procsesor, there will really be no need for extra
hardware to perform these tasks.  What I see now is a move by the video card
companies to software-based solutions (pixel shaders, etc.)  They have
recognized that there are limitations to what specialized hardware can do
and 

RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-03 Thread Sergey V. Udaltsov

After all these interesting and informative discussions, everyone has
forgotten the start of the thread:) Basically, there should one answer
to the question whether and how blending+fog can be implemented .
Possible variants:
1. Yes, it can be done with hardware acceleration. DRI team knows how.
2. No, it cannot be done. DRI team knows why.
3. Possibly yes but actually no. ATI knows but DRI will never know (NDA
issues)
4. Possibly yes but now DRI team does not know exactly how...
Which answer is the correct one? After this question is answered, we
(users/testers) could get the idea whether we'll finally have HW
implementation or DRI will use indirect rendering here. 
Actually, here is the point where CPU power does not really matter. What
really matters is possible/impossible and know/don't know (sure,
add want/don't want:)

BTW, just tested mach64 driver with 3ddesktop. _Really_ fast but there
is a lot of artefacts and incorrect rendering. Probably, it's buggy app
(version 0.1.2:) but not necessarily. Could anyone please try and
comment an experience?

Cheers,

Sergey

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Brian Paul

Sergey V. Udaltsov wrote:
 
  In these last few days I have been working on the Mesa software blending
  and the existing MMX bug. I've made some progress.
 Sorry for my ignorance, does this blending have anything to do with the
 incorrect fog handling in the tunnel app? Will this patch fix it?

I don't think so.  I haven't noticed a problem with fog in the tunnel demo.

-Brian

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Brian Paul

Sergey V. Udaltsov wrote:
 
  I don't think so.  I haven't noticed a problem with fog in the tunnel demo.
 So it works for you, doesn't it? Envious.
 For me, the fog effect does not work. Some time ago, someone (Jose?)
 even explained that is should not work on mach64 (alpha blending + some
 other effect?) So my question was whether it should work now or not.

You didn't say anything about Mach64 in your original message - I assumed
you were talking about software rendering/blending.  I haven't tried the
Mach64 branch yet.

-Brian

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Leif Delgass

On 2 Apr 2002, Sergey V. Udaltsov wrote:

  I don't think so.  I haven't noticed a problem with fog in the tunnel demo.
 So it works for you, doesn't it? Envious.
 For me, the fog effect does not work. Some time ago, someone (Jose?)
 even explained that is should not work on mach64 (alpha blending + some
 other effect?) So my question was whether it should work now or not.

No, this won't fix the problem.  Mach64 can't do fog and blending at the
same time, and the tunnel demo uses blending for the menu.  There was some
discussion of trying to use software fogging per-vertex when hardware
blending is enabled, but no one has implemented it yet.

-- 
Leif Delgass 
http://www.retinalburn.net


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Raystonn



[Resending, fell into last night's black hole it seems.]

I am definately all for increasing the performance of the software 
renderer.Eventually the main system processor will be fast enough to perform 
all ofthis without the need for a third party graphics card. The only 
thing videocards have today that is really better than the main processor is 
massiveamounts of memory bandwidth. Since memory bandwidth is 
increasing rapidly,I foresee the need for video cards lessening in the 
future. A properlyimplemented and optimized software version of a 
tile-based "scene-capture"renderer much like that used in Kyro could perform 
as well as the latestvideo cards in a year or two. This is what I am 
dabbling with at themoment.-Raystonn- Original 
Message -From: "Brian Paul" [EMAIL PROTECTED]To: 
[EMAIL PROTECTED]Cc: 
[EMAIL PROTECTED]Sent: 
Monday, April 01, 2002 6:36 AMSubject: [Mesa3d-dev] Re: [Dri-devel] Mesa 
software blending"José Fonseca" wrote: In these last 
few days I have been working on the Mesa software blending and the 
existing MMX bug. I've made some progress. I made a small test 
program which calls the relevant functions directly as Alex suggested. 
In the process I added comments to the assembly code (which had none). 
The error is due to the fact that the inner loop blends two pixels at 
the same time, so if the mask of the first element is zero then both are 
skipped. I also spotted some errors in the runin section, e.g., it ANDs 
with 4 and compares the result with 8 which is impossible... I still 
have to study the x86 architecture optimization a little further to know 
how to optimally fix both these situations. I also made two 
optimizations in blend_transparency(s_blend.c) which have no effect in 
the result precision but that achieved a global speedup of 30% in the 
function. These optimizations are in the C code and benefit all 
architectures. The first was to avoid the repetition of the 
input variable in the DIV255. At least my version of gcc (2.96) wasn't 
factoring the common code out yelding to a 17% speedup. 
The second was to factor the equation of blending reducing in half the 
number of multiplications. This optimization can be applied in other 
places on this file as well.Good work. I'll review your changes 
and probably apply it to the Mesa trunk(for version 4.1) later 
today. A third optimization that I'll try is the "double blend" 
trick (make two 8-bit multiplications at the same time in a 32-bit 
register) as documented by Michael Herf (http://www.stereopsis.com/doubleblend.html 
- a quite interesting site referred to me by Brian).I was going 
to do that someday too. Go for it. I would like to keep 
improving Mesa software rendering performance. I know that due to its 
versatility and power Mesa will never rival with a dedicated and 
non-conformant software 3d engine such as unreal one, nevertheless I 
think that it's possible to make it usefull for simple realtime 
rendering. Regards,Despite the proliferation of 3D hardware, there'll 
always be applicationsfor software rendering. For example, the 16-bit 
color channel features isbeing used by several animation 
houses.-Brian___Mesa3d-dev 
mailing list[EMAIL PROTECTED]https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Brian Paul

 Raystonn wrote:
 
 [Resending, fell into last night's black hole it seems.]
 
 I am definately all for increasing the performance of the software renderer.
 Eventually the main system processor will be fast enough to perform all of
 this without the need for a third party graphics card.  The only thing video
 cards have today that is really better than the main processor is massive
 amounts of memory bandwidth.  Since memory bandwidth is increasing rapidly,
 I foresee the need for video cards lessening in the future.  A properly
 implemented and optimized software version of a tile-based scene-capture
 renderer much like that used in Kyro could perform as well as the latest
 video cards in a year or two.  This is what I am dabbling with at the
 moment.

That's debatable.

My personal opinion is that special-purpose graphics hardware will
always perform better than a general-purpose CPU.  The graphics pipeline
is amenable to very specialized optimizations (both in computation and
the memory system) that aren't applicable to a general purpose CPU.

Of course, looking far enough into the future, all bets are off.

-Brian

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Alexander Stohr

   I don't think so.  I haven't noticed a problem with fog 
 in the tunnel demo.
  So it works for you, doesn't it? Envious.
  For me, the fog effect does not work. Some time ago, someone (Jose?)
  even explained that is should not work on mach64 (alpha 
 blending + some
  other effect?) So my question was whether it should work now or not.
 
 No, this won't fix the problem.  Mach64 can't do fog and 
 blending at the
 same time, and the tunnel demo uses blending for the menu.  
 There was some
 discussion of trying to use software fogging per-vertex when hardware
 blending is enabled, but no one has implemented it yet.

Jose was working on some MMX code that was currently disabled
in the Mesa source due to bugs in the coding. So he fixed problems 
that could not come into effect for your case. With that fix 
you might spot some speedup with an MMX capable CPU 
if you are running specific mesa demos on it.

Concerning the tunnel demo. As long as fogging is not required
(at least i think it is not) for the rendering of the alpha blended 
help texts and the other informatinal texts, it would be the best
if you just disable fogging for drawing these elements. Consider
that mode turn-off as a fix for some sub optimal application coding.

(I should have a look at that source and 
check if or why its not alredy done in that demo...)

Regards, Alex.


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



RE: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Alexander Stohr

Hello Raystonn,

sorry, but a dedicated ASIC hardware is always faster.
(you are a troll, arent you?)

in the straight forward OpenGL case (flat and smooth shading)
you can turn on several features in the pixel path and in
the geometry pipeline (culling, 8x lighting, clipping) 
that you wont be able to perform at the same speed with 
a normal CPU setup. Its not only the bandwidth, its the
floating point performance which the grafics chips are
capable of by the meance of multiple parallel and dedicated
FPU units.

For the pixel path, when (multi) texturing is enabled or alpha blending
or fogging or somtehing else that does readback (stencil buffer,
depth buffer dependent operations, anit aliased lines) then
you will spot that a classical CPU and processor system will not
perform at its best if doing pixel manipulations of that sort.

I think a regular grafics hardware can clean up your framebuffer
in a fraction of the time, that a cpu-mainboard pairing can do.
Thats the case since the good old IBM VGA from ages ago.

And dont tell me an UMA architecture is better in that case. 
You first have to accept that the RAM DAC is time sharing the
same bus system and therefore it permanently consumes bus cycles.
But if rasterisation has separate memory with an option for a
wider bus, separate chaces and higher clocked memory you will 
get better performance by design.

Regards, Alex.


-Original Message-
From: Raystonn [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, April 02, 2002 19:45
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending


[Resending, fell into last night's black hole it seems.]

I am definately all for increasing the performance of the software renderer.
Eventually the main system processor will be fast enough to perform all of
this without the need for a third party graphics card.  The only thing video
cards have today that is really better than the main processor is massive
amounts of memory bandwidth.  Since memory bandwidth is increasing rapidly,
I foresee the need for video cards lessening in the future.  A properly
implemented and optimized software version of a tile-based scene-capture
renderer much like that used in Kyro could perform as well as the latest
video cards in a year or two.  This is what I am dabbling with at the
moment.

-Raystonn


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Stephen J Baker


Gack!  I'm *so* sick of hearing this argument...

On Tue, 2 Apr 2002, Raystonn wrote:

 I am definately all for increasing the performance of the software renderer.

Yes.

 Eventually the main system processor will be fast enough to perform all of
 this without the need for a third party graphics card.

I very much doubt this will happen within the lifetime of silicon chip
technology.  Maybe with nanotech, biological or quantum computing - but
probably not even then.

  The only thing video
 cards have today that is really better than the main processor is massive
 amounts of memory bandwidth.

That is far from the truth - they have internal pipelining
and parallelism.  Their use of silicon can be optimised to balance
the performance of just one single algorithm.  You can never do that
for a machine that also has to run an OS, word process and run
spreadsheets.

  Since memory bandwidth is increasing rapidly,...

It is?!?  Let's look at the facts:

Since 1989, CPU speed has grown by a factor of 70.  Over the same
period the memory bus has increased by a factor of maybe 6 or so.

Caching can go some way to hiding that - but not for things like
graphics that need massive frame buffers and huge texture maps.
Caching also makes parallelism difficult and rendering algorithms
are highly parallelisable.  PC's are *horribly* memory-bound.

 I foresee the need for video cards lessening in the future.

Whilst memory bandwidth inside the main PC is increasing, it's doing
so very slowly - and all the tricks it uses to get that speedup are equally
applicable to the graphics hardware (things like DDR for example).

On the other hand, the graphics card can use heavily pipelined
operations to guarantee that the memory bandwidth is 100% utilised
- and can use an arbitarily large amount of parallelism to improve
throughput.  The main CPU can't do that because it's memory access
patterns are not regular and it has little idea where the next byte
has to be read from until it's too late.

Also, the instruction set of the main CPU isn't optimised for the
rendering task - where that is the ONLY thing the graphics chip
has to do.  The main CPU has all this legacy crap to deal with because
it's expected to run programs that were written 20 years ago.
Every generation of graphics chip can have a totally redesigned
internal architecture that exactly fits the profile of today's
RAM and silicon speeds.

You only have to look at the gap you are trying to bridge - a
modern graphics card is *easily* 100 times faster at rendering
sophisticated pixels (with pixel shaders, multiple textures and
antialiasing) than the CPU.

 A properly
 implemented and optimized software version of a tile-based scene-capture
 renderer much like that used in Kyro could perform as well as the latest
 video cards in a year or two.  This is what I am dabbling with at the
 moment.

I await this with interest - but 'scene capture' systems tend to be
unusable with modern graphics API's...they can't run either OpenGL
or Direct3D efficiently for arbitary input.  If there were to be
some change in consumer needs that would result in 'scene capture'
being a usable technique - then the graphics cards can easily take
that on board and will *STILL* beat the heck out of doing it in
the CPU.  Scene capture is also only feasible if the number of
polygons being rendered is small and bounded - the trends are
for modern graphics software to generate VAST numbers of polygons
on-the-fly precisely so they don't have to be stored in slow old
memory.

Everything that is speeding up the main CPU is also speeding up
the graphics processor - faster silicon, faster busses and faster
RAM all help the graphics just as much as they help the CPU.

However, increasing the number of transistors you can have on
a chip doesn't help the CPU out very much.  Their instruction
sets are not getting more complex in proportion to the increase
in silicon area - and their ability to make use of more complex
instructions is already limited by the brain power of compiler
writers.  Most of the speedup in modern CPU's is coming from
physically shorter distances for signals to travel and faster
clocks - all of the extra gates typically end up increasing the
size of the on-chip cache which has marginal benefits to graphics
algorithms.

In contrast to that, a graphics chip designer can just double
the number of pixel processors or something and get an almost
linear increase in performance with chip area with relatively
little design effort and no software changes.

If you doubt this, look at the progress over the last 5 or 6
years.  In late 1996 the Voodoo-1 had a 50Mpixel/sec fill rate.
In 2002 GeForce-4 has a fill rate of 4.8 Billion (antialiased)
pixels/sec - it's 100 times faster.  Over the same period,
your 1996 233MHz CPU has gone up to a 2GHz machine ...a mere
10x speedup.  The graphics cards are also gaining features.
Over that same period, they added - windowing, hardware TL,
antialiasing, 

Re: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending

2002-04-02 Thread Raystonn

I am definately all for increasing the performance of the software renderer.
Eventually the main system processor will be fast enough to perform all of
this without the need for a third party graphics card.  The only thing video
cards have today that is really better than the main processor is massive
amounts of memory bandwidth.  Since memory bandwidth is increasing rapidly,
I foresee the need for video cards lessening in the future.  A properly
implemented and optimized software version of a tile-based scene-capture
renderer much like that used in Kyro could perform as well as the latest
video cards in a year or two.  This is what I am dabbling with at the
moment.

-Raystonn


- Original Message -
From: Brian Paul [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, April 01, 2002 6:36 AM
Subject: [Mesa3d-dev] Re: [Dri-devel] Mesa software blending


José Fonseca wrote:

 In these last few days I have been working on the Mesa software blending
 and the existing MMX bug. I've made some progress.

 I made a small test program which calls the relevant functions directly as
 Alex suggested. In the process I added comments to the assembly code
 (which had none). The error is due to the fact that the inner loop blends
 two pixels at the same time, so if the mask of the first element is zero
 then both are skipped. I also spotted some errors in the runin section,
 e.g., it ANDs with 4 and compares the result with 8 which is impossible...
 I still have to study the x86 architecture optimization a little further
 to know how to optimally fix both these situations.

 I also made two optimizations in blend_transparency(s_blend.c) which have
 no effect in the result precision but that achieved a global speedup of
 30% in the function. These optimizations are in the C code and benefit all
 architectures.

 The first was to avoid the repetition of the input variable in the DIV255.
 At least my version of gcc (2.96) wasn't factoring the common code out
 yelding to a 17% speedup.

 The second was to factor the equation of blending reducing in half the
 number of multiplications. This optimization can be applied in other
 places on this file as well.

Good work.  I'll review your changes and probably apply it to the Mesa trunk
(for version 4.1) later today.


 A third optimization that I'll try is the double blend trick (make two
 8-bit multiplications at the same time in a 32-bit register) as documented
 by Michael Herf (http://www.stereopsis.com/doubleblend.html - a quite
 interesting site referred to me by Brian).

I was going to do that someday too.  Go for it.


 I would like to keep improving Mesa software rendering performance. I know
 that due to its versatility and power Mesa will never rival with a
 dedicated and non-conformant software 3d engine such as unreal one,
 nevertheless I think that it's possible to make it usefull for simple
 realtime rendering. Regards,

Despite the proliferation of 3D hardware, there'll always be applications
for software rendering.  For example, the 16-bit color channel features is
being used by several animation houses.

-Brian

___
Mesa3d-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel