ok 'parallelization code and hash table' in google show no real solution, i have another idea,instead of function-unify-two-minterms-and-tag ,just unify the minterms and tag them (which use hash table) later out of the // region, after if it still sucks i will drop list for vectors, making it step by step will show where is the bottleneck, let me just a few time and i will go back to the mailing lists with interesting results i hope....
On Thu, Oct 13, 2022 at 9:40 AM Damien Mattei <damien.mat...@gmail.com> wrote: > ok , i think the problem comes both from my code and from guile parmap so. > Obviously parmap could be slower on other codes because of the nature of > list i think, it is hard to split a list in sublist and send them to thread > and after redo a single list, i better use vector. > As mentioned and forget by me, i apologize, i use an hash table which is a > global variable and mutex can be set on it , no dead lock , here but it > slow down the code than it is dead for speed , but not locked. > > The examples given are good but i do not want to write long and specific > code for //, for // must a ssimple as OpenMP directives (on arrays) not be > a pain in the ass like things i already did at work with GPUs in C or > Fortran,that's nightmare and i do not want to have this in functional > programming. > > ok i will try to modify the code but i do not think there is easy solution > to replace the hash table, any structure i would use would be global and > the problem (this function is not pure, it does a side-effect),at the end i > do not think this code could be // but i'm not a specialist of //. I think > this is a parallelization problem, how can we deal with a global variable > such as an hash table? > > On Wed, Oct 12, 2022 at 11:55 PM Olivier Dion <olivier.d...@polymtl.ca> > wrote: > >> On Wed, 12 Oct 2022, Damien Mattei <damien.mat...@gmail.com> wrote: >> > Hello, >> > all is in the title, i test on a approximately 30000 element list , i >> got >> > 9s with map and 3min 30s with par-map on exactly the same piece of >> > code!? >> >> I can only speculate here. But trying with a very simple example here: >> --8<---------------cut here---------------start------------->8--- >> (use-modules (statprof)) >> (statprof (lambda () (par-map 1+ (iota 300000)))) >> --8<---------------cut here---------------end--------------->8--- >> >> Performance are terrible. I don't know how par-map is implemented, but >> if it does 1 element static scheduling -- which it probably does because >> you pass a linked list and not a vector -- then yeah you can assure that >> thing will be very slow. >> >> You're probably better off with dynamic scheduling with vectors. Here's >> a quick snippet I made for static scheduling but with vectors. Feel >> free to roll your own. >> >> --8<---------------cut here---------------start------------->8--- >> (use-modules >> (srfi srfi-1) >> (ice-9 threads)) >> >> (define* (par-map-vector proc input >> #:optional >> (max-thread (current-processor-count))) >> >> (let* ((block-size (quotient (vector-length input) max-thread)) >> (rest (remainder (vector-length input) max-thread)) >> (output (make-vector (vector-length input) #f))) >> (when (not (zero? block-size)) >> (let ((mtx (make-mutex)) >> (cnd (make-condition-variable)) >> (n 0)) >> (fold >> (lambda (scale output) >> (begin-thread >> (let lp ((i 0)) >> (when (< i block-size) >> (let ((i (+ i (* scale block-size)))) >> (vector-set! output i (proc (vector-ref input i)))) >> (lp (1+ i)))) >> (with-mutex mtx >> (set! n (1+ n)) >> (signal-condition-variable cnd))) >> output) >> output >> (iota max-thread)) >> (with-mutex mtx >> (while (not (< n max-thread)) >> (wait-condition-variable cnd mtx)))) >> (let ((base (- (vector-length input) rest))) >> (let lp ((i 0)) >> (when (< i rest) >> (let ((i (+ i base))) >> (vector-set! output i (proc (vector-ref input i)))) >> (lp (1+ i))))) >> output)) >> --8<---------------cut here---------------end--------------->8--- >> >> -- >> Olivier Dion >> oldiob.dev >> >