On Sat, Dec 5, 2009 at 6:01 PM, Jed Brown <jed at 59a2.org> wrote: > On Sat, 5 Dec 2009 16:50:38 -0600, Matthew Knepley <knepley at gmail.com> > wrote: > > You assign a few threads per element to calculate the FEM > > integral. You could maintain this unassembled if you only need > > actions. > > You can also store it with much less memory as just values at quadrature > points. >
Depends on the quadrature, but yes there are sometimes better storage schemes (especially if you have other properties like decay). > > However, if you want an actual sparse matrix, there are a couple of > > options > > > > > > 1) Store the unassembled matrix, and run assembly after integration > > is complete. This needs more memory, but should perform well. > > Fine, but how is this assembly done? If it's serial then it would be a > bottleneck, so you still need the concurrent thing below. > Vec assembly can be done on the CPU since its so small I think. > > 2) Use atmoic operations to update. I have not seen this yet, so I am > > unsure how is will perform. > > Atomic operations could be used per-entry but this costs on the order of > 100 cycles on CPUs. I think newer GPUs have atomics, but I don't know > the cost. Presumably it's at least as much as the latency of a read > from memory. > Not sure. Needs to be explored. Felipe is working on it. > When inserting in decent sized chunks, it would probably be worth taking > per-row or larger locks to amortize the cost of the atomics. > Additionally, you could statically partition the workload and only use > atomics for rows/entries that were shared. > You can use partitioning/coloring techniques to increase the lockless concurrency. Matt > Jed > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20091205/7662bfc8/attachment.html>