URL: <http://gna.org/bugs/?23618>
Summary: queuing system for multi processors is not well designed. Project: relax Submitted by: tlinnet Submitted on: Wed 27 May 2015 12:10:57 AM UTC Category: relax's source code Specific analysis category: None Priority: 5 - Normal Severity: 3 - Normal Status: None Assigned to: None Originator Name: Originator Email: Open/Closed: Open Release: Repository: trunk Discussion Lock: Any Operating System: All systems _______________________________________________________ Details: There queuing system for multi processors appears not to be designed well. This has been detected in dispersion analysis. A clustered fit of 74 spins, doing 100 monte carlo simulations. The test has been where a number of multi processors is 10, with 1 CPU as master. The problem seems to reside in: multi.processor.run_queue() multi.multi_processor.chunk_queue() The current queuing system will take the 100 monte carlo simulations, and chunk them up in pieces of 10, and distribute each of these chunks to each CPU. Each CPU thus have 10 simulations to handle. The problem is, that not each simulations is equally fast to be solved. Thus, a CPU will "hang" until all simulations has finished. This will "block" the possibility to assign CPU power for other tasks, until all simulations has finished. A suggestion for a "first" fix, is not to chunk up the queue, but let each simulation be handled independently. In multi/processor.py -------------- - lqueue = self.chunk_queue(self.command_queue) - self.run_command_queue(lqueue) + #lqueue = self.chunk_queue(self.command_queue) + self.run_command_queue(self.command_queue) ------------- This does seem to improve the timing much, but give a better overview in the process. It appears that the queuing system can even be enhanced more. The list of "Running set" is not replenished before all jobs in "Running set" is completed. This influences the solving time. ---- Only 20 monte carlo simulations is runned for comparison. /usr/bin/time -p relax_multi bug.py The running time for 1 CPU, no multi processor: real 510.94 user 5903.01 sys 133.96 The running time for 1 CPU, 4 multi processor: real 214.89 user 1786.39 sys 37.09 The running time for 1 CPU, 10 multi processor: real 108.39 user 1930.21 sys 44.45 The running time for 1 CPU, 4 multi processor with first fix: real 235.46 user 1892.20 sys 38.58 The running time for 1 CPU, 10 multi processor with first fix real 110.50 user 1957.99 sys 43.60 _______________________________________________________ File Attachments: ------------------------------------------------------- Date: Wed 27 May 2015 12:10:57 AM UTC Name: bug.bz2 Size: 301kB By: tlinnet <http://gna.org/bugs/download.php?file_id=24545> _______________________________________________________ Reply to this item at: <http://gna.org/bugs/?23618> _______________________________________________ Message sent via/by Gna! http://gna.org/ _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list relax-devel@gna.org To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel