Hey Raystonn, Oh my godness, who fed that trolls. ;-)
Lets still assume, you havent had the facts handy (due to your age, education, place of birth or current location) for seeing clear in all the subjects you are trying to adress. I would be much happier if i did feel that you were really working at least with a few of the concepts and tools you are refering to. The web is so big and there is nothing much which will hinder you from getting hands on one after the other, if you really like it. Maybe i am on error with you, and you might want to enlighten me what successful computer projects on earth are a result of your bold mind. (dont exclude computer projects for space crafts.) > > That is far from the truth - they have internal pipelining > > and parallelism. Their use of silicon can be optimised to balance > > the performance of just one single algorithm. You can never do that > > for a machine that also has to run an OS, word process and run > > spreadsheets. > > Modern processors have internal pipelining and parallelism as > well. But at a much lower degree. Please understand that a Pentium 4 with only a single 64 bit FPU unit for a 128 bit SSE operand is far away from the performence that a dedicated grafics pipline with some 4 dozends of FPUs can carry out. And some of these FPUs even work in parallel to their neighbourghs... > Most of the processing power of today's CPUs go completely unused. One reason more that a dedicated grafics chip with about the same amount of transistors as some CPU (thats almost true today) does perfom better by huge factors than the competing CPU. > It is possible to create optimized implementations using > Single-Instruction-Multiple-Data (SIMD) instructions of > efficient algorithms. As long as the optimized version just improves by only some 30% to 50% (or at maximum some 100%) it will never come close to what grafics chips will do by default. > > > Since memory bandwidth is increasing rapidly,... > > > > It is?!? Let's look at the facts: > > > > Since 1989, CPU speed has grown by a factor of 70. Over the same > > period the memory bus has increased by a factor of maybe 6 or so. > > We have gone from approximately 200MB/s of memory bandwidth > (PC66 EDO RAM) > to over 3.2GB/s (dual 16-bit RDRAM channels) in the last 5 > years. We have > over 16 times the memory bandwidth available today than we > did just 5 years > ago. > Available memory bandwidth has been growing more quickly than > processor clockspeed lately, No - you are still on error... CPU clock rate growth = 70, (from 33 to 2400) CPU register width growth = 4, (from 32 to 128) CPU pipelining speed increase = 2, (worst case assumption) CPU performance growth total = 560 = 70 * 4 * 2 according to your sample: memory bandwidth total growth = 16 (taken from your numbers) hmm, my 1990 system had some DIL rams with ending "-70" and "-80" which means the latency in nanoseconds. Some cache RAM already had only some 20 ns. if you assume DDR does have 2,5 nsec today i get a factor of 32. Futhrer assumed a factor of 4 in bus width increase we are still at an overall increase of only 64. nicely considered with some factor of 2 bus clocking optimisations we are still facing only a speed increase of 128 for the RAM whilst the CPU speed increase was determined to have a value of 560. > and I do not foresee an end to this any time soon. That doesnt contribute any adavance in the argue. Sorry. > > On the other hand, the graphics card can use heavily pipelined > > operations to guarantee that the memory bandwidth is 100% utilised > > Overutilised in my opinion. Not sure what you want to say with it. Byond 100% there is nothing than saturation. A well tuned system utilizes in its main application case 100% for all units. For the not so common cases one of the units will run at 100% whilst other components are idle at a smaller percentage. > The amount of overdraw performed by today's > video cards on modern games and applications is incredible. No, its a sign of a bad application design. ;-) And anyways, a good video card will eliminate already 50% or more of the supplied data in early calculation states, especially when the dumb applications does send anything to the adapter. > Immediate mode rendering is an inefficient algorithm. Agreed, but you can only efficiently use "instant" display list rendering if your scenery has a noticeable constant geometry data. Its a question of what you are doing. And its a question of the application. > Video cards tend to have extremely well optimized implementations > of this inefficient algorithm. I dont feel that you know much about OpenGL and its core design principles. I would recommend you a reading of the "red book" to get into the subject. You might find out that most of above is in no way a required thing when doing OpenGL based rendering. And you will further see that even most cases the you suppose to be damaging to performance a) wont occure often and b) get nicely eliminated by hardware in a prettye early stage. The only thing that is not covered is the case when you insist yourself on writing dumb code. (Only happens when the basics aren't understood.) > Modern processors have a considerable amount of parallelism > built in. Its relative. Two integer ALUs are much compared to one, but is only a laugh if you compare it to the 128 channel to 128 channel parallel pipelined digital switch matrix ICs running at some 5 GHz, which were introduced last year for a later use in communications sattelites. I cant exactly specify the number of integer and logic ALU in a grafics chipset but they are big and they are in heavy parallel ussage. > With advanced prefetch and streaming SIMD instructions it is very > possible to do these types of operations in a modern processor. You talk about an extension of the CPU command command set? Notice, any new command in current processor architecture makes opcode decoding complexer. Hey, sure you can do anything in software. Even simulate a grafics chips operations. But thats not the point. Its only the question which is faster. At least its simpler to say "x=3, y=4, length=5, DRAW_SEMI_TRANSPARENT_LINE" to a grafics processor and do something else in the meantime, instead of doing it with your main processor. And if you say its fast enough for you, then i can only tell you that it just heavily depends on what you want to do. > It will, however, take another couple of years to be able to render at great > framerates and high resolutions. Interesting point of view. In a few years CPUs have evolved in several areas and grafics processors have evolved as well. So the distance will still exist which will give you a good reason for using a dedicated grafics hardware for the grafics. You have to consider that your requirements to a computer system might have evolved in that time frame as well. Maybe a fully 3D desktop, or Stereo-3D on the desktop via some shutter glasses, or high res cinemascope movies from highly compressed MPEG4 and successor file formats. > > You only have to look at the gap you are trying to bridge - a > > modern graphics card is *easily* 100 times faster at rendering > > sophisticated pixels (with pixel shaders, multiple textures and > > antialiasing) than the CPU. > > They are limited in what they can do. No doubt. But if it fullfills the purpose why should you care about? Maybe you do have a Mini-Van, but still get your gas for the stove via a pipeline - and you never think about to change it. > In order to allow more flexibility they have recently introduced > pixel shaders, which basically turns the video card into a mini-CPU. Nice point - its a RISC engine then. It has a pretty compressed command set for doing all from inside the chaches. Main targets of such engines is a highler level of customisation, so far correct. But its still dedicated hardware. Forming a curve from a line geometry plus a forumla is nothing new at all. Read the "red book" about nurbs and polylines. The advantage is that there is only few data to transfer to the adapter but the results are wide and directly feed to the render circuits at the maximum rate that silicon does allow. Thats much more than you would be able to do on any external bus system. > Modern processors can perform these features more quickly and would > allow an order of magnitude more flexibility in what can be done. I doubt the "more quickly" rather strongly. And when they start up to store their results into main memory all the gain might get lost. A typical OpenGL setup might have some 5 kB to 20 kB of important data that all could impact the results of a geometry calculation and rendering. You cannot store all these parameters in CPU registers, and the much bigger program for the respective software renderer wont fit your fastet code caches. > Kyro-based video cards perform quite well. Hmm, is SGI's Fahrenheit API still alive? I think only scene graph based applications do have a chance to bring Kyro to a significant performance boost. But that means you have to trash most of your existing software and buy a big bunch of new software to satisfy that silicon's need. Or have i overseen a path that allows soft migration with getting the hidden performance already now? > They are not quite up to the level of nVidia's latest cards but this is > new technology and is being worked on by a relatively new company. Whats new with the Kyro? I think there is not much that new with it. Z-Buffer sorting algorithms for optimizing the finaly drawn pixel amount arent new. They are known since the first "Castle Wolfenstein" releases once upon the time i had my C64 powerd on for regula use. Tiled rendering? Nothing new with it. Having a limited viewport for rendering a bigger scenery is a thing that George Lucas used in his labs since there was hardware rendering used. Merging several sub images into one big image is nothing that trills an insider anymore. > These cards do not require nearly as much memory bandwidth as > immediate-mode renderers, performing 0 overdraw. > They are more processing intensive rather than being bandwidth intensive. > I see this as a more efficient algorithm. So you suddenly favour complex grafics processors instead of CPUs? Let me say, fogging of the misc sorts, alpha blending, stenceling and whatever more is used for making rendered images looking more realistic implys that framebuffer reading and writing is performed (unless you are using that high priced and nearly death 3D memory concept, as E&S/Mithsubishi offered for some time, i.e. for the FireGL 4000). Conclusion, if you render simple images, then you might have a chance to gain a slight performance boost with specific hardware technologies, but as soon as you turn to more realistic images, you will only gain performance with well designed memory interfaces, high performance caching and a chipset that runs at high clock rates and wide parallelism. > Everything starts out in hardware and eventually moves to software. Things only move to software if the CPU gets enough speed and the respective hardware is to costly. Think of the DVD accellerators that made DVD playback possible on a P-166. They were replaced by software players untill the P-500 with MMX popped up. But there is still DCT/iDCT in current grafics processors, because it makes much more sense there and doesn't cost $100 total but only some $0.002 per grafics unit. While some external source pumps the data in one format to the grafics chip, the operation applies immedeately and the data is written in the result format to the framebuffer. Compare this to what data streams you would all cause if a CPU would do the same... > There will come a time when the basic functionality provided by > video cards can be easily done by a main processor. Ah, you agree, that a CPU needs complex coding for doing such easy stuff. Maybe its because the main purpose of a CPU is not what you want it to do. Even if it could do as easily as the grafics chipset, it still would be done faster because of the strategic optimal location of the grafics chipset between the main memory and the framebuffer memory. And further its some pretty nice and attractive sort of multiprocessing whicht benefits overall system availability. > The extra features offered by the video cards, such as pixel shaders, > are simply attempts to stand-in as a main processor. Typically consumer devices have the _smallest_ processor as their main circuit. These central processors only have to query status of the other devices, check for the keyboard and initate activity. I dont think a grafics chipset will take over main CPUs duties. Dont worry. ;-) For general tasks: the CPU For grafics: the grafics core At least there were no grafics core that had its own OS or did "animate" a whole productive environement just because it had attached some sort of ROM. > Once the basic functionality of the video card can be performed > by the main system procsesor, there will really be no need for extra > hardware to perform these tasks. You are again thinking of something like UMA? (see other mailing) Urge a Pentium 4+X to supply a totally permanent data stream to a digital-to-analog converter (DAC) at nanosecond precisiness in order to have an image on some screen (or multiple screens). If you now say, the screen is smart and only needs part time updates, then i will tell you that in this case the screen includes the grafics chipset in some way. Seen for decades when PostScript supporting monitors (like printers) were availabel for some old Unix workstations. If you know about GLX remote rendering protocoll, its something like this. > What I see now is a move by the video card companies to software-based > solutions (pixel shaders, etc.). They have recognized that there are > limitations to what specialized hardware can do and they are now attempting > to allow programmers more flexibility. They provide another feature that plugs into the high speed grafics chipset and performs itself much better than it could be done if programmed remotely. > However, this is the kind of functionality where the main system > processor has a huge advantage. It's not the goal of programming anything via the grafics chip, but only to allow several nice things that are of repeated use while best possible performance provided. ;-) see this: CPU --- programmable pipelined grafics chipset --- pipelined grafics chipset The left side provides more programming freedom with infinite complexity. The right side provides more performance at a fixed complexity or programming. The middle provides the performance of the right combined with a programming freedom that meets almost 90% of existing desires. So you will now get 9 out of 10 of your previous CPU tasks solved at maximum speed. Only for 1/10th of the cases you still have to stick with slower CPU operations. > If more features are added in this manner (as software) then the > specialized video card hardware will lose its edge. As long as grafics chips do evolve as quickly as the cpu there is no reason for predicting that the power ration between both of them will ever change. Therefor i see no reason why one will obsolte the other. > Intel is capable of pushing microprocessor technology more quickly > than nVidia or ATI, regardless of how much nVidia wants their technology > to be at the center of the chipset. Intel has bought a grafics vendor for ages. So far as i remember it was a Lockhead Martin subsidiary Real3D that was bought and stood for the first intel x86-PC grafics since ever, the i740. There were successors like the i815 and i830 mainboard chipsets with integrated grafics, partialy with audio and modem support. But that were neither highly impressing nor were they seen in a wide range of computers, only for all in one office desktops. (ATI and nVidia are doing embedded chipsets as well, so there is not much to say.) In a look back, the i740 was just made to promote the intel owned AGP bus. Yes, intel is in 3D Grafics business, but there is no view that they do want to change anything dramatically in their current level of presencs. If you do know more, just tell me! > What would you call MMX, SSE, SSE2, and even 3dnow? These are additional > instructions designed to optimize the use of these new transistors. They are command set extensions, that you must take special care and efforts for making use of them. Their use is limited to specific sequences in the grafics code. I think some only 20% of that opcodes might come into effect if you run e.g. Mesa software rendering. Jose resently did a fix for such code and if you only have seen the patches he sent to the list, you will know it was neither a simple task nor it was a fast task to fix it in the assembler statments. > > instructions is already limited by the brain power of compiler > > writers. > > Since when can you write a pixel shading routine in a standard C/C++ compiler? Since i saw that handwritten assembler does most often represent slower code than C code that the compiler was allowed to optimize automatically. Get hands on VTune and get aware of all those 99 assembler tricks that the compiler could solve better than you with your brain works. BTW, i think the amount of C++ code in all OpenGL implementations out there is below 3%. Its simply overhead in the end that slows the code down significantly. > Assembly language can be used for the main processor just as easily > as it can be used for pixel shaders using nVidia's own assembly language. But it will result in much faster execution for the dedicated grafics risc engine. So why code for the CPU in assembler whilst it executes equally fast when done in C or much faster when its done as microcode for the grafics chipset? > In fact, there is a great deal more support for assembly language > on the main processor. Outch. I would hardcode binary opcodes for double the coding time if i would get significant performance increase from the hardware for an infinitive runitme. > Modern processors have multiple parallel units for both > integer and FPU operations. Just nice, but if i have data of only one sort, then its of no use. Thats the typical case if you decided to have one out of MMX, SSE, 16-bit-math, ... And interleaving such data processing in a single piece of code wont work. Hyperthreading might utilize both units if there are two different codings of which none consumes the full bandwidth to memory, and only in theory. > Increasing processor performance is much more complex than a simple die shrink. And therrfore grafics performance is likely to explode much faster than CPU performance. A grafics core is more straight forward, but that were already said in this thread. > Fill rate is just memory bandwidth. It is not hard to offer more memory > channels. In fact, a dual-channel DDR chipset is coming soon for the > Pentium 4. In May the Pentium 4 will have access to 4.3GB/s of memory > bandwidth. Future generations will offer considerably more. Dont assume grafics chipsets development has reached a stand still today. Its evolving everywhere for memory and busses, so it its not even sure which component does gain that sort of advance first and which next. > The Intel C/C++ compiler generates MMX, SSE, and SSE2 instructions if you > tell it to do so. It requires no inline assembly, though inline assembly is > always a good idea. SSE and SSE2 are used in nVidia's drivers... You never heared of compiler "intrinsics" as introduced by intel some 3 years ago? You never had a look into super computing compiler optimisation issues? The tools are there for do it at an abstraction level, but as soon as dedicated hardware is in place for your tasks, it can outdo your CPU easily. > > CONCLUSION. > > ~~~~~~~~~~~ > > There is no sign whatever that CPU's are "catching up" with > > graphics cards - and no logical reason why they ever will. > > I will have to disagree here. Indications are that the video card > manufacturers are looking more and more into 'programmable' > features such as the pixel shaders. Yes? And it boost performance far, and especiall far byond the level a CPU of same complexity can ever provide. And there are no signs that complexity of a CPU and a grafics chipset will split apart widely. > If this is the case it would be relatively easy for the > main processor to 'catch up'. Programmability is its specialty. Programmability is the thing that prevents it from the ultimate performance levels. Ultimate programmability is nice, but never intended by the grafics vendors. Catching up to general RISC designs (like the Motorola M68k) did Intel take some 20 years. And the DEC alpha series still outdos the current intel designs. Trying to catching up with dedicated on-die risc engines as a front gate to grafics processors is a no win goal. Intel would be crazy if they put their engineering powers into such a subject. And i dont think AMD will do. (At least they are still doing nice commandset speedups for their CPUs today.) > At any rate, we will probably just have to agree to disagree here. ;) > > -Raystonn I am strongly disagreeing with your overall opinion. I only find myself doing several reasoning where your insights were possibly to far away from the facts. So that it is. Regards, Alex. _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel