Sorry for lot of typos
On Mon, Apr 9, 2012 at 1:53 PM, Varun Nagpal varun.nagp...@gmail.comwrote:
GP programming on GPU is useful for those algorithms which are
computationally intensive, can be paralleled with least overheads,
granularity of per thread computations is not big, less+similar control
flow per thread and at the same time do regular data access(for example
array based data is regular while pointer based data is irregular). Massive
multi-threading us used by GPU's to hide memory latency
CPUs are essentially meant to run control-intensive(lot of conditional
code: branch predictors help here) , irregular memory access (Memory
hierarchy Register file, L1, L2 L3 caches helps here) and coarse grained
multi-threaded applications(Multi-threaded processor architectures and
HyperThreading helps here). Memory hierarchy + hardware multi-threading is
used for hiding memory latency
For a given algorithm, thousands of threads run on a GPU compared to
handful (max some hundreds) that would run on a CPU.
There is no general rule to say that an algorithm of O(n^3) complexity
will run faster on CPU or GPU. My answer would be it depends. It depends
upon lot of other things about the algorithm(data structure layout,
floating point calculations etc.) and the available hardware options and
its architecture.
One of the criteria of how to choose would be see the calculations/per
memory access. The higher is this value, the better it would be suitable
for GPU than CPU and vice versa
I suggest you to this question on a computer architecture forum.
Thanks
Varun
On Mon, Apr 9, 2012 at 1:21 PM, vikas vikas.rastogi2...@gmail.com wrote:
Hey Arun, IIya,
the GPUs are faster because of
1. designed for graphics processing, which involves a lot of matrix
processing capabilities , simple example transformation of matrices in
to various view (projection, model and viewport , some times needed
even in real time) so these computation are done in parallel
2. all or most of processing are done at much precise rate and until
one does not specify, all are 'double computations' which is quite
costly even in modern CPU - ALU
3. not only computations, a lot of other parallel architectural
advantage gives normal algorithms ( e.g. cache) better speedup than
CPU
hope it clarifies. So if you are planning to start on GPU, start
thinking in multi-threaded
copying data generally involves separate processing of DMA, I worked
with USB and PCI 66MHz connection of CPU/GPU , and does not seem to be
slow. even Fujitsu CoralPA was ok which has very slow dma and a PCI 33
connection.
On Apr 8, 4:04 am, Ilya Albrekht ilya.albre...@gmail.com wrote:
Hey Phoenix,
It is true that current GPU have way better floating point throughput
than
any general purpose processor. But when you want to run your algo. on
the
GPU there are always an overhead of copying data between CPU GPU,
drivers
and other system calls and you can gain performance even with those
overhead if you have a lot of calculations (more calculations, less
overhead %). And I assume in general you have to do at least O(n^3)
calculations to gain any perf.
Out of my experience, the same thigh with the SSE vectorization - it
doesn't make sense to vectorize the loop if it is less than ~25-27
iterations, because the overhead of preparing data and aligning buffers
will be too high.
On Saturday, 7 April 2012 08:54:20 UTC-7, phoenix wrote:
@SAMM: what about general mathematical computations such as matrix
multiplication which is O(n^3) as such? How do you relate your
explanation
to such math computations or any algorithm of atleast O(n^3)?
On Sat, Apr 7, 2012 at 3:22 AM, SAMM somnath.nit...@gmail.com
wrote:
This is becoz the GPU is multithreaded . In graphics there are three
main
steps , Application based work where vertex Processing , read the
data ,
pixel processing are done .
Secondly come the Culling which which determimes which portion will
be
shown given the Line of sight . This also checks for any
intersection with
other objects . For instance a man is present behind the building
,so he
should not be visible to us in Graphics or some portion of this body
will
be shown , This intersection is called redering .
The third step if draw . to finally draw the model .
These three process are done multithreaded parallerly given 3x
Processing
speed .
You can refer this link below :-
http://www.panda3d.org/manual/index.php/Multithreaded_Render_Pipeline
--
You received this message because you are subscribed to the Google
Groups
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.
--
People often