A little clarificatio. Here is the function I refer to when I say global add, I
should have said global atomic add floats.
inline void GAtomicAdd(volatile __global float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while (atomic_cmpxchg((volatile __global unsigned int *)source,
prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
As used in the code
GAtomicAdd(&pdose[indz+indx*NZ+indy*NZ*NCOL],urn);
Pdose is a large array stored in global memory. Urn is the value the current
item has calculated and needs to add to the array at the current item position.
Each workitem takes a random path and it is highly unlikely that a race
condition exists given the large number of cells in Pdose millions compared to
the number of workitems thousands but I have to include it for those times it
does occur. This is the function the AMD profiler says all the time is spent
executing.
Sent from my Samsung Galaxy Tab® S
-------- Original message --------
From: CRV§ADER//KY <[email protected]>
Date: 08/13/2015 3:44 AM (GMT-05:00)
To: Joe Haywood <[email protected]>
Cc: Pyopencl <[email protected]>
Subject: Re: [PyOpenCL] Opinions
Excuse my confusion, but what do you mean with global addition? C = a + b,
where a b and c are vectors of single precision floats in shared memory?
Or is it double precision?
On 12 Aug 2015 22:15, "Joe Haywood"
<[email protected]<mailto:[email protected]>> wrote:
I apologize in advance for asking the following questions because they are not
directly related to pyopencl. Also, I realize opinions can be very diverse but
I think you all might be able to help me. I am planning on purchasing a new
laptop to have for programming at home. I am currently using a workstation with
an NVIDIA 780 TI while at work. I have been able to get my pyopencl code to
run at nearly the same speed as my CUDA code on this hardware. I have tried
running the pyopencl code on an AMD FirePro V4800 and see serious speed
degradation. According to the AMD profiler, the bottleneck is the global add.
Also, a few websites suggest utilizing float4's would increase the speed, but
programming float4s in this embarrassingly parallel Monte Carlo code is
impractical due to branching. Further investigation using the old CompuBench
website (early 2014 ish) confirmed the global addition on anything except
NVIDIA was very slow. That was nearly 2 years ago. The compubench website no
longer lists global add as an evaluation. So, in your experience is this still
the case, that anything except Nvidia will be slow at global additions? Or have
AMD and Intel "caught up"? I cannot find any laptops spec'd exactly the way I
want, but the 2015 MacBook Pro is close. I just don't want to buy one and run
the code and see it also suffers a terrible loss of speed. Finally, I noticed
on the compubench website that the NVIDIA GTX 980M is equal or better than the
GTX 780 TI in nearly all tests. If you have this hardware, can you confirm this
with your own code? I can run some tests on my computer if someone has a 980M
they would be willing to give me numbers on.
Again, I apologize for being off topic, private messages might be best, and I
appreciate your help.
Thanks
Reese
Joe Reese Haywood, Ph.D., DABR
Medical Physicist
Johnson Family Cancer Center
Mercy Health Muskegon
1440 E. Sherman Blvd, Suite 300
Muskegon, MI 49444
Phone: 231-672-2019
Email: [email protected]<mailto:[email protected]>
Confidentiality Notice:
This e-mail, including any attachments is the property of Trinity Health and is
intended for the sole use of the intended recipient(s). It may contain
information that is privileged and confidential. Any unauthorized review, use,
disclosure, or distribution is prohibited. If you are not the intended
recipient, please delete this message, and reply to the sender regarding the
error in a separate email.
_______________________________________________
PyOpenCL mailing list
[email protected]<mailto:[email protected]>
http://lists.tiker.net/listinfo/pyopencl
Confidentiality Notice:
This e-mail, including any attachments is the property of Trinity Health and is
intended for the sole use of the intended recipient(s). It may contain
information that is privileged and confidential. Any unauthorized review, use,
disclosure, or distribution is prohibited. If you are not the intended
recipient, please delete this message, and reply to the sender regarding the
error in a separate email.
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl