GP programming on GPU is useful for those algorithms which are computationally intensive, can be paralleled with least overheads, granularity of per thread computations is not big, less+similar control flow per thread and at the same time do regular data access(for example array based data is regular while pointer based data is irregular). Massive multi-threading us used by GPU's to hide memory latency
CPUs are essentially meant to run control-intensive(lot of conditional code: branch predictors help here) , irregular memory access (Memory hierarchy Register file, L1, L2 L3 caches helps here) and coarse grained multi-threaded applications(Multi-threaded processor architectures and HyperThreading helps here). Memory hierarchy + hardware multi-threading is used for hiding memory latency For a given algorithm, thousands of threads run on a GPU compared to handful (max some hundreds) that would run on a CPU. There is no general rule to say that an algorithm of O(n^3) complexity will run faster on CPU or GPU. My answer would be it depends. It depends upon lot of other things about the algorithm(data structure layout, floating point calculations etc.) and the available hardware options and its architecture. One of the criteria of how to choose would be see the calculations/per memory access. The higher is this value, the better it would be suitable for GPU than CPU and vice versa I suggest you to this question on a computer architecture forum. Thanks Varun On Mon, Apr 9, 2012 at 1:21 PM, vikas <vikas.rastogi2...@gmail.com> wrote: > Hey Arun, IIya, > the GPUs are faster because of > > 1. designed for graphics processing, which involves a lot of matrix > processing capabilities , simple example transformation of matrices in > to various view (projection, model and viewport , some times needed > even in real time) so these computation are done in parallel > 2. all or most of processing are done at much precise rate and until > one does not specify, all are 'double computations' which is quite > costly even in modern CPU - ALU > 3. not only computations, a lot of other parallel architectural > advantage gives normal algorithms ( e.g. cache) better speedup than > CPU > > hope it clarifies. So if you are planning to start on GPU, start > thinking in multi-threaded > > copying data generally involves separate processing of DMA, I worked > with USB and PCI 66MHz connection of CPU/GPU , and does not seem to be > slow. even Fujitsu CoralPA was ok which has very slow dma and a PCI 33 > connection. > > > On Apr 8, 4:04 am, Ilya Albrekht <ilya.albre...@gmail.com> wrote: > > Hey Phoenix, > > > > It is true that current GPU have way better floating point throughput > than > > any general purpose processor. But when you want to run your algo. on the > > GPU there are always an overhead of copying data between CPU & GPU, > drivers > > and other system calls and you can gain performance even with those > > overhead if you have a lot of calculations (more calculations, less > > overhead %). And I assume in general you have to do at least O(n^3) > > calculations to gain any perf. > > > > Out of my experience, the same thigh with the SSE vectorization - it > > doesn't make sense to vectorize the loop if it is less than ~25-27 > > iterations, because the overhead of preparing data and aligning buffers > > will be too high. > > > > > > > > > > > > > > > > On Saturday, 7 April 2012 08:54:20 UTC-7, phoenix wrote: > > > > > @SAMM: what about general mathematical computations such as matrix > > > multiplication which is O(n^3) as such? How do you relate your > explanation > > > to such math computations or any algorithm of atleast O(n^3)? > > > > > On Sat, Apr 7, 2012 at 3:22 AM, SAMM <somnath.nit...@gmail.com> wrote: > > > > >> This is becoz the GPU is multithreaded . In graphics there are three > main > > >> steps , Application based work where vertex Processing , read the > data , > > >> pixel processing are done . > > >> Secondly come the Culling which which determimes which portion will be > > >> shown given the Line of sight . This also checks for any intersection > with > > >> other objects . For instance a man is present behind the building ,so > he > > >> should not be visible to us in Graphics or some portion of this body > will > > >> be shown , This intersection is called redering . > > > > >> The third step if draw . to finally draw the model . > > > > >> These three process are done multithreaded parallerly given 3x > Processing > > >> speed . > > >> You can refer this link below :- > > >>http://www.panda3d.org/manual/index.php/Multithreaded_Render_Pipeline > > > > >> -- > > >> You received this message because you are subscribed to the Google > Groups > > >> "Algorithm Geeks" group. > > >> To post to this group, send email to algogeeks@googlegroups.com. > > >> To unsubscribe from this group, send email to > > >> algogeeks+unsubscr...@googlegroups.com. > > >> For more options, visit this group at > > >>http://groups.google.com/group/algogeeks?hl=en. > > > > > -- > > > "People often say that motivation doesn't last. Well, neither does > > > bathing - that's why we recommend it daily." > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to algogeeks@googlegroups.com. > To unsubscribe from this group, send email to > algogeeks+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.