Sorry for lot of typos On Mon, Apr 9, 2012 at 1:53 PM, Varun Nagpal <varun.nagp...@gmail.com>wrote:
> GP programming on GPU is useful for those algorithms which are > computationally intensive, can be paralleled with least overheads, > granularity of per thread computations is not big, less+similar control > flow per thread and at the same time do regular data access(for example > array based data is regular while pointer based data is irregular). Massive > multi-threading us used by GPU's to hide memory latency > > CPUs are essentially meant to run control-intensive(lot of conditional > code: branch predictors help here) , irregular memory access (Memory > hierarchy Register file, L1, L2 L3 caches helps here) and coarse grained > multi-threaded applications(Multi-threaded processor architectures and > HyperThreading helps here). Memory hierarchy + hardware multi-threading is > used for hiding memory latency > > For a given algorithm, thousands of threads run on a GPU compared to > handful (max some hundreds) that would run on a CPU. > > There is no general rule to say that an algorithm of O(n^3) complexity > will run faster on CPU or GPU. My answer would be it depends. It depends > upon lot of other things about the algorithm(data structure layout, > floating point calculations etc.) and the available hardware options and > its architecture. > > One of the criteria of how to choose would be see the calculations/per > memory access. The higher is this value, the better it would be suitable > for GPU than CPU and vice versa > > I suggest you to this question on a computer architecture forum. > > Thanks > Varun > > On Mon, Apr 9, 2012 at 1:21 PM, vikas <vikas.rastogi2...@gmail.com> wrote: > >> Hey Arun, IIya, >> the GPUs are faster because of >> >> 1. designed for graphics processing, which involves a lot of matrix >> processing capabilities , simple example transformation of matrices in >> to various view (projection, model and viewport , some times needed >> even in real time) so these computation are done in parallel >> 2. all or most of processing are done at much precise rate and until >> one does not specify, all are 'double computations' which is quite >> costly even in modern CPU - ALU >> 3. not only computations, a lot of other parallel architectural >> advantage gives normal algorithms ( e.g. cache) better speedup than >> CPU >> >> hope it clarifies. So if you are planning to start on GPU, start >> thinking in multi-threaded >> >> copying data generally involves separate processing of DMA, I worked >> with USB and PCI 66MHz connection of CPU/GPU , and does not seem to be >> slow. even Fujitsu CoralPA was ok which has very slow dma and a PCI 33 >> connection. >> >> >> On Apr 8, 4:04 am, Ilya Albrekht <ilya.albre...@gmail.com> wrote: >> > Hey Phoenix, >> > >> > It is true that current GPU have way better floating point throughput >> than >> > any general purpose processor. But when you want to run your algo. on >> the >> > GPU there are always an overhead of copying data between CPU & GPU, >> drivers >> > and other system calls and you can gain performance even with those >> > overhead if you have a lot of calculations (more calculations, less >> > overhead %). And I assume in general you have to do at least O(n^3) >> > calculations to gain any perf. >> > >> > Out of my experience, the same thigh with the SSE vectorization - it >> > doesn't make sense to vectorize the loop if it is less than ~25-27 >> > iterations, because the overhead of preparing data and aligning buffers >> > will be too high. >> > >> > >> > >> > >> > >> > >> > >> > On Saturday, 7 April 2012 08:54:20 UTC-7, phoenix wrote: >> > >> > > @SAMM: what about general mathematical computations such as matrix >> > > multiplication which is O(n^3) as such? How do you relate your >> explanation >> > > to such math computations or any algorithm of atleast O(n^3)? >> > >> > > On Sat, Apr 7, 2012 at 3:22 AM, SAMM <somnath.nit...@gmail.com> >> wrote: >> > >> > >> This is becoz the GPU is multithreaded . In graphics there are three >> main >> > >> steps , Application based work where vertex Processing , read the >> data , >> > >> pixel processing are done . >> > >> Secondly come the Culling which which determimes which portion will >> be >> > >> shown given the Line of sight . This also checks for any >> intersection with >> > >> other objects . For instance a man is present behind the building >> ,so he >> > >> should not be visible to us in Graphics or some portion of this body >> will >> > >> be shown , This intersection is called redering . >> > >> > >> The third step if draw . to finally draw the model . >> > >> > >> These three process are done multithreaded parallerly given 3x >> Processing >> > >> speed . >> > >> You can refer this link below :- >> > >>http://www.panda3d.org/manual/index.php/Multithreaded_Render_Pipeline >> > >> > >> -- >> > >> You received this message because you are subscribed to the Google >> Groups >> > >> "Algorithm Geeks" group. >> > >> To post to this group, send email to algogeeks@googlegroups.com. >> > >> To unsubscribe from this group, send email to >> > >> algogeeks+unsubscr...@googlegroups.com. >> > >> For more options, visit this group at >> > >>http://groups.google.com/group/algogeeks?hl=en. >> > >> > > -- >> > > "People often say that motivation doesn't last. Well, neither does >> > > bathing - that's why we recommend it daily." >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Algorithm Geeks" group. >> To post to this group, send email to algogeeks@googlegroups.com. >> To unsubscribe from this group, send email to >> algogeeks+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/algogeeks?hl=en. >> >> > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.