On 2011-01-30 21:43, "Martin v. Löwis" wrote:
Am 30.01.2011 17:54, schrieb Alexander Belopolsky:
On Sun, Jan 30, 2011 at 11:35 AM, Victor Stinner
<victor.stin...@haypocalc.com>  wrote:
..
We should find a compromise between speed (limit the number of system
calls) and the usability of Python modules.

Do you have measurements that show python spending significant time on
failing open calls?

No; past measurements always showed that this is insignificant, probably
thanks to operating system caching the relevant directory blocks (so
it doesn't really matter whether you make one or ten lookups per
directory; my guess is that it matters more if you look into ten
directories instead of one).

Dear Python-developers,
I would like you to be aware of one particular problem related to the system calls in massively parallel systems. We are developing a Python-based simulation software GPAW (https://wiki.fysik.dtu.dk/gpaw/) and tested it with up to tens of thousands of CPU cores. The program uses MPI, thus thousands of Python interpreters are launched at start-up time. As all these interpreters execute the same import statements, the huge amount of (IO-related) system calls puts extreme pressure to the file system, and as result just starting the Python interpreter(s) can take ~45 minutes with ~30 000 CPU cores!

Currently, we have tried to work around the problem either by installing Python and required additional modules (NumPy and GPAW) to a ramdisk, or by modifying the CPython source (at the moment 2.6 version) in such a way that only single process performs the system calls and uses MPI to broadcast the results to other processes (preliminary work in progress).

As a related problem, dynamic linking can also be quite expensive (or even not available in some systems), and in some cases we have made a small hack to CPython for enabling statically linked packages (simple modules can of course be included relatively easily in static Python build.)

I am not expecting that the problems can be solved easily for the general CPython interpreter, especially as massively parallel supercomputers are quite small niche of Python usage. However, I think it would be good to be aware of problems with large amount of system calls in a more special Python usage.

Best regards,
Jussi
--
Jussi Enkovaara, Application Scientist, High Performance Computing, CSC
PO. BOX 405 02101 Espoo, Finland, Tel +358 9 457 2935, fax +358 9 457 2302
CSC - IT Center for Science, www.csc.fi, e-mail: jussi.enkova...@csc.fi
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to