On Wednesday 26 February 2014, beranger six wrote: > I start a work to parellelize(with openmp, and then maybe > cuda) the LU decomposition in gnucap.
Openmp looks like a good way to do it. It looks simple, and comes with most compilers including gcc. Do not use cuda. Unless I misunderstand, the licensing of cuda makes it unsuitable for use in a GNU project. > > I see your topic about parallelism, and i have some time to > do it. > > Futhermore, we definitly need faster simulation result for > our application. > > What kind of solution did you have in mind: Very simple .. Identify certain loops that can run in parallel. That is really all. You should look at the output of the "status" command to see where the time is spent, which will show where parallelism could be of benefit and how much benefit to expect. In the LU decomposition, running the outermost loop in parallel should be all that is needed there. But to get enough benefit model evaluation also should be parallel, and likely more important. > -Is it the "section" you design with row, diagonal, and > column? In this case did you want to use , the fact that if > all the section beetween _lownode[mm] and mm are calculated > we could computed the element. > > In this case we could have a dependence graph(or tree) > applied to your storage matrix section, mostlyy used to > parrallelize Gilbert-Peierls Algorithm . I don't think that makes sense here, but you might want to try it. Remember .. gnucap's matrix solver usually does low rank updates and partial solutions. If you lose this feature it could make it so much slower that any parallel operation can not come close to recovering the loss. The simpler solver used for AC analysis is not parallel ready. To parallelize the AC matrix solution it may be necessary to switch to the other lu_decomp, which requires double matrix storage. > -Is it an iterative method,with the problem that the > convergence could take theorically an infinite number of > operation.(so maybe not a good way) no -- not iterative -- except for the standard DC_tran Newton iteration which would not change. > -Is it parallelize only the map(multiplicaton beetwen > element) of dot product, and then maybe parallelize the > reduction(addition beetween elements). I think the overhead of parallelizing the dot product would be too high, thinking of the multi-thread model. The dot product might be a candidate for GPU type processing, but look at "status" to judge whether there is enough potential benefit before doing this. > -Did you had in mind to apply permutation matrix to ease > implementation of parrallelism, or directly doing the best > matrix in evalution of netlist. No .. that would probably make it slower. The speed gain of a better order would be offset by the overhead of ordering and the more complex access. Also .. remember that gnucap does incremental updates and partial solutions. The order that is optimal for this is different from the ordering optimal for solving the entire matrix. I am aware of a problem with read-in where the recursive "find_looking_out" can waste a lot of time. Again, "status" will tell you. > > Regards, > > Beranger Six > > _______________________________________________ > Gnucap-devel mailing list > [email protected] > https://lists.gnu.org/mailman/listinfo/gnucap-devel _______________________________________________ Gnucap-devel mailing list [email protected] https://lists.gnu.org/mailman/listinfo/gnucap-devel
