Steffen Daode Nurpmeso <sdao...@googlemail.com> added the comment: Yet another bug of Mac OS X: it sometimes creates messed up sparse regions:
14:00 ~/tmp/test $ ~/src/cpython/python.exe test_mmap.py .. 14:01 ~/tmp/test $ zsum32 py-mmap-testfile Adler-32 <db8d743c> CRC-32 <78ebae7a> -- py-mmap-testfile 14:03 ~/tmp/test $ ./test_mmap Size 4294971396/0x100001004: open. lseek. write. fsync. fstat. mmap. [0]. [s.st_size-4]. munmap. 14:04 ~/tmp/test $ zsum32 c-mmap-testfile Adler-32 <14b9018b> CRC-32 <c6e340bf> -- c-mmap-testfile 14:08 ~/tmp/test $ hexdump -C -s 4000 -n 128 c-mmap-testfile 00000fa0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001020 14:08 ~/tmp/test $ hexdump -C -s 4000 -n 128 py-mmap-testfile 00000fa0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 db db db db db db db db db db db db db db db db |................| * 00001020 Conclusions: 1. It is unwise to create memory regions GT hw.usermem=1651843072 // 2 and extremely unwise to do so for regions GT hw.user_wire_limit=1811939328 // 2 Exceeding this limit and Mac OS X effectively enters an endless loop which may cause so much paging activity that the system is almost locked. (P.S.: if you invoke diff(1) on two extremely large files you may produce an unkillable process which survives SIGKILL and "Activity Monitor" initiated "Force Quit"s; not to talk about termination of the parent shell.) 2. Mac OS X does not reliably produce sparse files. If the attached files 11277.mmap-2.c and 11277.mmap-2.py are modified not to unlink(2) the produced files (not hard for the Python version), then: cmp --verbose py-mmap-testfile c-mmap-testfile | wc 95832 287496 1820808 3. For at least sparse files the VMS of Mac OS X is unable to create an accessible mmap(2)ing if the size of the mapping is in the inclusive range UINT32_MAX+1 .. UINT32_MAX + PAGESIZE (== 4096) and the file has been written to. Closing the file and reopening it will make the mapping not only succeed but also accessible (talking about Python). 4. If you chose a size which does not fail immediately, then if you don't reopen but only instrument mmapmodule.c then subscript self=0x100771350 CALCULATED SUBSCRIPT 4095 subscript self=0x100771350 CALCULATED SUBSCRIPT 4096 Bus Error Thus, accessing the first byte of the second page causes Python to fail with SIGBUS, *even* if you explicitely fsync() the fd in new_mmap_object(); fstat(2) code runs anyway. The C version does *not* have this problem, here fsync() alone does the magic. 5. Python's C code: mumble mumble mumble. That really needs to be said at least. 6. The error is in mmapmodule.c, function new_mmap_object(). It is the call to mmap(2). Wether i dup(2) or not. Whatever i do. Even if i reduce new_mmap_object() to the running code from 11277.mmap-2.c: if (fd != -1 && fstat(fd, &st) == 0 && S_ISREG(st.st_mode) && map_size == 0) map_size = st.st_size; fprintf(stderr,"before mmap(2): size=%lu,fd=%d\n",(size_t)map_size, fd); {void *addr = mmap(NULL, (size_t)map_size, PROT_READ, MAP_SHARED, fd, 0); fprintf(stderr, "after mmap(2): size=%lu,fd=%d got address=%p\n",(size_t)map_size, fd, addr); {size_t j; for (j = 0; j < map_size; ++j) { char x; if (j % 1024 == 0) fprintf(stderr, "INDEX %lu\n",j); x = ((volatile char*)addr)[j] } fprintf(stderr, "PASSED ALL INDICIES\n"); exit(1); } } ... 17:41 ~/tmp/test $ ~/src/cpython/python.exe 11277.mmap-2.py DESCRIPTOR FLAGS WILL BE 0 DESCRIPTOR FLAGS WILL BE 0 Start: time.struct_time(tm_year=2011, tm_mon=4, tm_mday=16, tm_hour=15, tm_min=41, tm_sec=22, tm_wday=5, tm_yday=106, tm_isdst=0) Testing file size 4294971400: DESCRIPTOR FLAGS WILL BE 1538 new_mmap_object _GetMapSize o=0x1001f5d10 before mmap(2): size=4294971396,fd=3 after mmap(2): size=4294971396,fd=3 got address=0x101140000 INDEX 0 INDEX 1024 INDEX 2048 INDEX 3072 INDEX 4096 Bus error 7. Note the C version also works if i prepend many malloc(3) calls. 8. I have no idea what Python does here. Maybe it's ld(1) and dynamic module-loading related. Maybe Apples VM gets confused about memory regions if several conditions come together. I have no idea of what Python does along it's way to initialize itself. It's a lot. And i'm someone who did not even look into Doc/c-api/ at all yet except for a grep -Fr tp_as_buf Doc/ today (the first version of the iterate-cpl-buffer used buffer interface). So please explain any comments you might give. Maybe i'll write a patch to add tests to test_mmap.py. Beside that i'm out of this. 9. Maybe it's really better to simply skip this on Mac OS X. Z. ... and maybe someone with a name should ask someone with a name-name-name to ask those californian ocean surfers to fix at least some of the OS bugs? My bug reports are not even adhered by Opera, even if i attach reproducable scripts or URLs... ---------- Added file: http://bugs.python.org/file21683/11277.mmap-2.c Added file: http://bugs.python.org/file21684/11277.mmap-2.py _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11277> _______________________________________
#include <errno.h> #include <signal.h> #include <stdio.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/uio.h> #define PATH "c-mmap-testfile" #define PAGESIZE 4096 static void sighdl(int); static void sighdl(int signo) { const char errmsg[] = "\nSignal occurred, cleaning up\n"; (void)signo; (void)signal(SIGSEGV, SIG_DFL); (void)signal(SIGBUS, SIG_DFL); write(2, errmsg, sizeof(errmsg)-1); (void)unlink(PATH); return; } int main(void) { int fd, estat = 0; void *addr; size_t i; auto struct stat s; /* *Final* sizes (string written after lseek(2): "abcd") */ const size_t *ct, tests[] = { /* Tested good */ //0x100000000 - PAGESIZE - 5, //0x100000000 - 4, //0x100000000 - 3, //0x100000000 - 1, 0x100000000 + PAGESIZE + 4, //0x100000000 + PAGESIZE + 5, /* Tested bad */ //0x100000000, //0x100000000 + PAGESIZE, //0x100000000 + PAGESIZE + 1, //0x100000000 + PAGESIZE + 3, 0 }; if (signal(SIGSEGV, &sighdl) == SIG_ERR) goto jerror; if (signal(SIGBUS, &sighdl) == SIG_ERR) goto jerror; for (ct = tests; *ct != 0; ++ct) { fprintf(stderr, "Size %lu/0x%lX: open", *ct, *ct); fd = open(PATH, O_RDWR|O_TRUNC|O_CREAT, 0666); if (fd < 0) goto jerror; fprintf(stderr, ". "); fprintf(stderr, "lseek"); if (lseek(fd, *ct-4, SEEK_END) < 0) goto jerror; fprintf(stderr, ". "); fprintf(stderr, "write"); if (write(fd, "abcd", 4) != 4) goto jerror; fprintf(stderr, ". "); fprintf(stderr, "fsync"); if (fsync(fd) != 0) goto jerror; fprintf(stderr, ". "); fprintf(stderr, "fstat"); if (fstat(fd, &s) != 0) goto jerror; fprintf(stderr, ". "); if (*ct != (size_t)s.st_size) { fprintf(stderr, "fstat size mismatch: %lu is not %lu\n", (size_t)s.st_size, *ct); continue; } fprintf(stderr, "mmap"); addr = mmap(NULL, s.st_size, PROT_READ, MAP_SHARED, fd, 0); if (addr == NULL) goto jerror; fprintf(stderr, ". "); (void)close(fd); /* Can also be left off, doesn't matter */ fprintf(stderr, "[0]"); if (((char*)addr)[0] != '\0') goto jerror; fprintf(stderr, ". "); fprintf(stderr, "[s.st_size-4]"); if (((char*)addr)[s.st_size-4] != 'a') goto jerror; fprintf(stderr, ". "); fprintf(stderr, "[ALL IN ORDER]"); for (i = 0; i < (size_t)s.st_size; ++i) { char x = ((volatile char*)addr)[0]; (void)x; } fprintf(stderr, ". "); fprintf(stderr, "munmap"); if (munmap(addr, s.st_size) != 0) goto jerror; fprintf(stderr, "."); fprintf(stderr, "\n"); } jleave: (void)unlink(PATH); return estat; jerror: fprintf(stderr, "\n%s\n", strerror(errno)); estat = 1; goto jleave; }
import os,sys,time,mmap,zlib PAGESIZE = 4096 SIZES = ((2**32) + PAGESIZE + 4, 0) FILE = 'py-mmap-testfile' print('Start:', time.gmtime()) for i in SIZES: if i == 0: break print('Testing file size ', str(i), ': ', sep='', end='') sys.stdout.flush() with open(FILE, "wb+") as f: f.seek(i-4) f.write(b'abcd') f.flush() sb = os.stat(FILE) if sb.st_size != i: print('size failure:', sb.st_size, ' != ', i, sep='', end='') sys.stdout.flush() mem = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) if mem[0] != ord('\0'): print('offset 0 failed: ', ord(mem[0]), ' ', end='', sep='') else: print('offset 0 ok ', end='', sep='') sys.stdout.flush() if mem[i-4] != ord('a'): print('offset i-4 failed: ', ord(mem[i-4]), ' ', end='', sep='') else: print('offset i-4 ok ', end='', sep='') print('[ALL IN ORDER] ', sep='', end='') sys.stdout.flush() for j in range(0, sb.st_size): y = mem[j] print('ok') sys.stdout.flush() os.unlink(FILE) print('End:', time.gmtime())
_______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com