Brad,

Do not put this high on your agenda. Thanks for raising the issue in Github.

I can attack the problem the current way. For my problems, the impact is maybe a theoretically known 10%-15% penalty in performance causes by cache misses because the algorithm is reading and writing a matrix by columns (rather than by rows).

I will document the issue in my report on this project, note a projected 10%-15% performance into the future and express lots of hope, and move on.

I can prove that I get a 10% performance improvement on a 6-core machine with 1.20 if I avoid the reduce for the O(n) operation,

        z^T = ( ( b^T * B ) / g ) / h

but then access memory by rows in the 2nd loop for the O(n^2) operation

        B += b * z^T

So that says that Chapel does give me the speed gain I predict.

The worry is that I get a 1000% performance drop on a 36-core with 1.22.

This is doing a Singular Value Decomposition on a 3200*3200 matrix.

I will document it all more fully at the end of next week but I do not really have the insight or experience in Chapel (or the mix of systems with old and new releases to do tersting) to really assist much with solving the problem.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer


_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to