[algogeeks] Re: GPU doubt

2012-04-09 Thread vikas
Hey Arun, IIya,
   the GPUs are faster because of

1. designed for graphics processing, which involves a lot of matrix
processing capabilities , simple example transformation of matrices in
to various view (projection, model and viewport , some times needed
even in real time) so these computation are done in parallel
2. all or most of processing are done at much precise rate and until
one does not specify, all are 'double computations' which is quite
costly even in modern CPU - ALU
3. not only computations, a lot of other parallel architectural
advantage gives normal algorithms ( e.g. cache) better speedup than
CPU

hope it clarifies. So if you are planning to start on GPU, start
thinking in multi-threaded

copying data generally involves separate processing of DMA, I worked
with USB and PCI 66MHz connection of CPU/GPU , and does not seem to be
slow. even Fujitsu CoralPA was ok which has very slow dma and a PCI 33
connection.


On Apr 8, 4:04 am, Ilya Albrekht ilya.albre...@gmail.com wrote:
 Hey Phoenix,

 It is true that current GPU have way better floating point throughput than
 any general purpose processor. But when you want to run your algo. on the
 GPU there are always an overhead of copying data between CPU  GPU, drivers
 and other system calls and you can gain performance even with those
 overhead if you have a lot of calculations (more calculations, less
 overhead %). And  I assume in general you have to do at least O(n^3)
 calculations to gain any perf.

 Out of my experience, the same thigh with the SSE vectorization - it
 doesn't make sense to vectorize the loop if it is less than ~25-27
 iterations, because the overhead of preparing data and aligning buffers
 will be too high.







 On Saturday, 7 April 2012 08:54:20 UTC-7, phoenix wrote:

  @SAMM: what about general mathematical computations such as matrix
  multiplication which is O(n^3) as such? How do you relate your explanation
  to such math computations or any algorithm of atleast O(n^3)?

  On Sat, Apr 7, 2012 at 3:22 AM, SAMM somnath.nit...@gmail.com wrote:

  This is becoz the GPU is multithreaded . In graphics there are three main
  steps , Application based work where vertex Processing , read the data ,
  pixel processing are done .
  Secondly come the Culling which which determimes which portion will be
  shown given the Line of sight . This also checks for any intersection with
  other objects . For instance a man is present behind the building ,so he
  should not be visible to us in Graphics or some portion of this body will
  be shown , This intersection is called redering .

  The third step if draw . to finally draw the model .

  These three process are done multithreaded parallerly given 3x Processing
  speed .
  You can refer this link below :-
 http://www.panda3d.org/manual/index.php/Multithreaded_Render_Pipeline

   --
  You received this message because you are subscribed to the Google Groups
  Algorithm Geeks group.
  To post to this group, send email to algogeeks@googlegroups.com.
  To unsubscribe from this group, send email to
  algogeeks+unsubscr...@googlegroups.com.
  For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.

  --
   People often say that motivation doesn't last. Well, neither does
  bathing - that's why we recommend it daily.

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Re: GPU doubt

2012-04-09 Thread Varun Nagpal
Sorry for lot of typos

On Mon, Apr 9, 2012 at 1:53 PM, Varun Nagpal varun.nagp...@gmail.comwrote:

 GP programming on GPU is useful for those algorithms which are
 computationally intensive, can be paralleled with least overheads,
 granularity of per thread computations is not big, less+similar control
 flow per thread and at the same time do regular data access(for example
 array based data is regular while pointer based data is irregular). Massive
 multi-threading us used by GPU's to hide memory latency

 CPUs are essentially meant to run control-intensive(lot of conditional
 code: branch predictors help here) , irregular memory access (Memory
 hierarchy Register file, L1, L2 L3 caches helps here) and coarse grained
 multi-threaded applications(Multi-threaded processor architectures and
 HyperThreading helps here). Memory hierarchy + hardware multi-threading is
 used for hiding memory latency

 For a given algorithm, thousands of threads run on a GPU compared to
 handful (max some hundreds) that would run on a CPU.

 There is no general rule to say that an algorithm of O(n^3) complexity
 will run faster on CPU or GPU. My answer would be it depends. It depends
 upon lot of other things about the algorithm(data structure layout,
 floating point calculations etc.)  and the available hardware options and
 its architecture.

 One of the criteria of how to choose would be see the calculations/per
 memory access. The higher is this value, the better it would be suitable
 for GPU than CPU and vice versa

 I suggest you to this question on a computer architecture forum.

 Thanks
 Varun

 On Mon, Apr 9, 2012 at 1:21 PM, vikas vikas.rastogi2...@gmail.com wrote:

 Hey Arun, IIya,
   the GPUs are faster because of

 1. designed for graphics processing, which involves a lot of matrix
 processing capabilities , simple example transformation of matrices in
 to various view (projection, model and viewport , some times needed
 even in real time) so these computation are done in parallel
 2. all or most of processing are done at much precise rate and until
 one does not specify, all are 'double computations' which is quite
 costly even in modern CPU - ALU
 3. not only computations, a lot of other parallel architectural
 advantage gives normal algorithms ( e.g. cache) better speedup than
 CPU

 hope it clarifies. So if you are planning to start on GPU, start
 thinking in multi-threaded

 copying data generally involves separate processing of DMA, I worked
 with USB and PCI 66MHz connection of CPU/GPU , and does not seem to be
 slow. even Fujitsu CoralPA was ok which has very slow dma and a PCI 33
 connection.


 On Apr 8, 4:04 am, Ilya Albrekht ilya.albre...@gmail.com wrote:
  Hey Phoenix,
 
  It is true that current GPU have way better floating point throughput
 than
  any general purpose processor. But when you want to run your algo. on
 the
  GPU there are always an overhead of copying data between CPU  GPU,
 drivers
  and other system calls and you can gain performance even with those
  overhead if you have a lot of calculations (more calculations, less
  overhead %). And  I assume in general you have to do at least O(n^3)
  calculations to gain any perf.
 
  Out of my experience, the same thigh with the SSE vectorization - it
  doesn't make sense to vectorize the loop if it is less than ~25-27
  iterations, because the overhead of preparing data and aligning buffers
  will be too high.
 
 
 
 
 
 
 
  On Saturday, 7 April 2012 08:54:20 UTC-7, phoenix wrote:
 
   @SAMM: what about general mathematical computations such as matrix
   multiplication which is O(n^3) as such? How do you relate your
 explanation
   to such math computations or any algorithm of atleast O(n^3)?
 
   On Sat, Apr 7, 2012 at 3:22 AM, SAMM somnath.nit...@gmail.com
 wrote:
 
   This is becoz the GPU is multithreaded . In graphics there are three
 main
   steps , Application based work where vertex Processing , read the
 data ,
   pixel processing are done .
   Secondly come the Culling which which determimes which portion will
 be
   shown given the Line of sight . This also checks for any
 intersection with
   other objects . For instance a man is present behind the building
 ,so he
   should not be visible to us in Graphics or some portion of this body
 will
   be shown , This intersection is called redering .
 
   The third step if draw . to finally draw the model .
 
   These three process are done multithreaded parallerly given 3x
 Processing
   speed .
   You can refer this link below :-
  http://www.panda3d.org/manual/index.php/Multithreaded_Render_Pipeline
 
--
   You received this message because you are subscribed to the Google
 Groups
   Algorithm Geeks group.
   To post to this group, send email to algogeeks@googlegroups.com.
   To unsubscribe from this group, send email to
   algogeeks+unsubscr...@googlegroups.com.
   For more options, visit this group at
  http://groups.google.com/group/algogeeks?hl=en.
 
   --
People often