GP programming on GPU is useful for those algorithms which are
computationally intensive, can be paralleled with least overheads,
granularity of per thread computations is not big, less+similar control
flow per thread and at the same time do regular data access(for example
array based data is regular while pointer based data is irregular). Massive
multi-threading us used by GPU's to hide memory latency

CPUs are essentially meant to run control-intensive(lot of conditional
code: branch predictors help here) , irregular memory access (Memory
hierarchy Register file, L1, L2 L3 caches helps here) and coarse grained
multi-threaded applications(Multi-threaded processor architectures and
HyperThreading helps here). Memory hierarchy + hardware multi-threading is
used for hiding memory latency

For a given algorithm, thousands of threads run on a GPU compared to
handful (max some hundreds) that would run on a CPU.

There is no general rule to say that an algorithm of O(n^3) complexity will
run faster on CPU or GPU. My answer would be it depends. It depends upon
lot of other things about the algorithm(data structure layout, floating
point calculations etc.)  and the available hardware options and its
architecture.

One of the criteria of how to choose would be see the calculations/per
memory access. The higher is this value, the better it would be suitable
for GPU than CPU and vice versa

I suggest you to this question on a computer architecture forum.

Thanks
Varun

On Mon, Apr 9, 2012 at 1:21 PM, vikas <vikas.rastogi2...@gmail.com> wrote:

> Hey Arun, IIya,
>   the GPUs are faster because of
>
> 1. designed for graphics processing, which involves a lot of matrix
> processing capabilities , simple example transformation of matrices in
> to various view (projection, model and viewport , some times needed
> even in real time) so these computation are done in parallel
> 2. all or most of processing are done at much precise rate and until
> one does not specify, all are 'double computations' which is quite
> costly even in modern CPU - ALU
> 3. not only computations, a lot of other parallel architectural
> advantage gives normal algorithms ( e.g. cache) better speedup than
> CPU
>
> hope it clarifies. So if you are planning to start on GPU, start
> thinking in multi-threaded
>
> copying data generally involves separate processing of DMA, I worked
> with USB and PCI 66MHz connection of CPU/GPU , and does not seem to be
> slow. even Fujitsu CoralPA was ok which has very slow dma and a PCI 33
> connection.
>
>
> On Apr 8, 4:04 am, Ilya Albrekht <ilya.albre...@gmail.com> wrote:
> > Hey Phoenix,
> >
> > It is true that current GPU have way better floating point throughput
> than
> > any general purpose processor. But when you want to run your algo. on the
> > GPU there are always an overhead of copying data between CPU & GPU,
> drivers
> > and other system calls and you can gain performance even with those
> > overhead if you have a lot of calculations (more calculations, less
> > overhead %). And  I assume in general you have to do at least O(n^3)
> > calculations to gain any perf.
> >
> > Out of my experience, the same thigh with the SSE vectorization - it
> > doesn't make sense to vectorize the loop if it is less than ~25-27
> > iterations, because the overhead of preparing data and aligning buffers
> > will be too high.
> >
> >
> >
> >
> >
> >
> >
> > On Saturday, 7 April 2012 08:54:20 UTC-7, phoenix wrote:
> >
> > > @SAMM: what about general mathematical computations such as matrix
> > > multiplication which is O(n^3) as such? How do you relate your
> explanation
> > > to such math computations or any algorithm of atleast O(n^3)?
> >
> > > On Sat, Apr 7, 2012 at 3:22 AM, SAMM <somnath.nit...@gmail.com> wrote:
> >
> > >> This is becoz the GPU is multithreaded . In graphics there are three
> main
> > >> steps , Application based work where vertex Processing , read the
> data ,
> > >> pixel processing are done .
> > >> Secondly come the Culling which which determimes which portion will be
> > >> shown given the Line of sight . This also checks for any intersection
> with
> > >> other objects . For instance a man is present behind the building ,so
> he
> > >> should not be visible to us in Graphics or some portion of this body
> will
> > >> be shown , This intersection is called redering .
> >
> > >> The third step if draw . to finally draw the model .
> >
> > >> These three process are done multithreaded parallerly given 3x
> Processing
> > >> speed .
> > >> You can refer this link below :-
> > >>http://www.panda3d.org/manual/index.php/Multithreaded_Render_Pipeline
> >
> > >>  --
> > >> You received this message because you are subscribed to the Google
> Groups
> > >> "Algorithm Geeks" group.
> > >> To post to this group, send email to algogeeks@googlegroups.com.
> > >> To unsubscribe from this group, send email to
> > >> algogeeks+unsubscr...@googlegroups.com.
> > >> For more options, visit this group at
> > >>http://groups.google.com/group/algogeeks?hl=en.
> >
> > > --
> > >  "People often say that motivation doesn't last. Well, neither does
> > > bathing - that's why we recommend it daily."
>
> --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to algogeeks@googlegroups.com.
> To unsubscribe from this group, send email to
> algogeeks+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to