I am no longer as pessimistic as I once was. To start, this is a good report on prefix-sum in parallel:
http://www.cs.cmu.edu/~blelloch/papers/Ble93.pdf It can solve first-order recurrences optimally. Higher order recurrences can be reduced to first-order, however it is not work optimal (too big by a factor of the order, m), which seems alright to me. Thus we should be able to solve bandwidth m matrices efficiently. Here is a nice implementation of prefix-sum http://code.google.com/p/cudpp/ This all leads me to believe that for a bounded number of nonzeros per row, we can get an efficient algorithm for triangular solve. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20100816/615d43d0/attachment.html>