A little clarificatio. Here is the function I refer to when I say global add, I 
should have said global atomic add floats.

inline void GAtomicAdd(volatile __global float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;

union {
unsigned int intVal;
float floatVal;
} prevVal;

do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile __global unsigned int *)source, 
prevVal.intVal, newVal.intVal) != prevVal.intVal);
}

As used in the code
GAtomicAdd(&pdose[indz+indx*NZ+indy*NZ*NCOL],urn);

Pdose is a large array stored in global memory. Urn is the value the current 
item has calculated and needs to add to the array at the current item position. 
Each workitem takes a random path and it is highly unlikely that a race 
condition exists given the large number of cells in Pdose millions compared to 
the number of workitems thousands but I have to include it for those times it 
does occur. This is the function the AMD profiler says all the time is spent 
executing.


Sent from my Samsung Galaxy Tab® S


-------- Original message --------
From: CRV§ADER//KY <[email protected]>
Date: 08/13/2015 3:44 AM (GMT-05:00)
To: Joe Haywood <[email protected]>
Cc: Pyopencl <[email protected]>
Subject: Re: [PyOpenCL] Opinions


Excuse my confusion, but what do you mean with global addition? C = a + b,  
where a b and c are vectors of single precision floats in shared memory?
Or is it double precision?

On 12 Aug 2015 22:15, "Joe Haywood" 
<[email protected]<mailto:[email protected]>> wrote:
I apologize in advance for asking the following questions because they are not 
directly related to pyopencl.  Also, I realize opinions can be very diverse but 
I think you all might be able to help me. I am planning on purchasing a new 
laptop to have for programming at home. I am currently using a workstation with 
an NVIDIA 780 TI while at work.  I have been able to get my pyopencl code to 
run at nearly the same speed as my CUDA code on this hardware. I have tried 
running the pyopencl code on an AMD FirePro V4800 and see serious speed 
degradation. According to the AMD profiler, the bottleneck is the global add. 
Also, a few websites suggest utilizing float4's would increase the speed, but 
programming float4s in this embarrassingly parallel Monte Carlo code is 
impractical due to branching. Further investigation using the old CompuBench 
website (early 2014 ish) confirmed the global addition on anything except 
NVIDIA was very slow. That was nearly 2 years ago. The compubench website no 
longer lists global add as an evaluation. So, in your experience is this still 
the case, that anything except Nvidia will be slow at global additions? Or have 
AMD and Intel "caught up"? I cannot find any laptops spec'd exactly the way I 
want, but the 2015 MacBook Pro is close. I just don't want to buy one and run 
the code and see it also suffers a terrible loss of speed. Finally, I noticed 
on the compubench website that the NVIDIA GTX 980M is equal or better than the 
GTX 780 TI in nearly all tests. If you have this hardware, can you confirm this 
with your own code? I can run some tests on my computer if someone has a 980M 
they would be willing to give me numbers on.

Again, I apologize for being off topic, private messages might be best, and I 
appreciate your help.

Thanks
Reese

Joe Reese Haywood, Ph.D., DABR
Medical Physicist
Johnson Family Cancer Center
Mercy Health Muskegon
1440 E. Sherman Blvd, Suite 300
Muskegon, MI 49444
Phone: 231-672-2019
Email: [email protected]<mailto:[email protected]>


Confidentiality Notice:
This e-mail, including any attachments is the property of Trinity Health and is 
intended for the sole use of the intended recipient(s). It may contain 
information that is privileged and confidential.  Any unauthorized review, use, 
disclosure, or distribution is prohibited. If you are not the intended 
recipient, please delete this message, and reply to the sender regarding the 
error in a separate email.

_______________________________________________
PyOpenCL mailing list
[email protected]<mailto:[email protected]>
http://lists.tiker.net/listinfo/pyopencl


Confidentiality Notice:
This e-mail, including any attachments is the property of Trinity Health and is 
intended for the sole use of the intended recipient(s). It may contain 
information that is privileged and confidential.  Any unauthorized review, use, 
disclosure, or distribution is prohibited. If you are not the intended 
recipient, please delete this message, and reply to the sender regarding the 
error in a separate email.
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to