[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2015-04-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Instead of the loop you can use writelines():

f.writelines([b'\0' * bs] * (size // bs))

It would be nice to add a comment that estimate why os.ftruncate() or 
seek+write can't be used here. At least a link to this issue with short 
estimation.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2015-04-14 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Actually, recent POSIX states unconditionally that:

« If the file previously was smaller than this size, ftruncate() shall increase 
the size of the file. If the file size is increased, the extended area shall 
appear as if it were zero-filled. »

(from http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2015-04-13 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
versions:  -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2015-04-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Ok, I've committed the patch. If desired, the generic API for shared memory can 
be tackled in a separate issue. Thank you Médéric!

--
resolution:  - fixed
stage: patch review - resolved
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2015-04-13 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 0f944e424d67 by Antoine Pitrou in branch 'default':
Issue #21116: Avoid blowing memory when allocating a multiprocessing shared
https://hg.python.org/cpython/rev/0f944e424d67

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-05 Thread Charles-François Natali

Charles-François Natali added the comment:

Indeed, I think it would make sense to consider this for 3.4, and even 2.7
if we opt for a simple fix.

As for the best way to fix it in the meantime, I'm fine with a buffered
zero-filling (the mere fact that noone ever complained until now probably
means that the performance isn't a show-stopper for users).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-04 Thread Médéric Boquien

Médéric Boquien added the comment:

Thanks for the explanations Charles-François. I guess the new API would not be 
before 3.5 at least. Is there still a chance to integrate my patch (or any 
other) to improve the situation for the 3.4 series though?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-03 Thread Charles-François Natali

Charles-François Natali added the comment:

 If I remember correctly the problem is that some OS like linux (and
probably others) do not really allocate space until something is written.
If that's the case then the process may get killed later on when it writes
something in the array.

Yes, it's called overcommitting, and it's a good thing. It's exactly the
same thing for memory: malloc() can return non-NULL, and the process will
get killed when first writing to the page in case of memory pressure.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-03 Thread Médéric Boquien

Médéric Boquien added the comment:

the process will get killed when first writing to the page in case of memory 
pressure.

According to the documentation, the returned shared array is zeroed. 
https://docs.python.org/3.4/library/multiprocessing.html#module-multiprocessing.sharedctypes

In that case because the entire array is written at allocation, the process is 
expected to get killed if allocating more memory than available. Unless I am 
misunderstanding something, which is entirely possible.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-03 Thread Charles-François Natali

Charles-François Natali added the comment:

 Also, the FreeBSD man page for mmap() has the following warning:

That's mostly important for real file-backed mapping.
In our case, we don't want a file-backed mmap: we expect the mapping to fit
entirely in memory, so the writeback/read performance isn't that important
to us.

 Using truncate() to zero extend is not really portable: it is only
guaranteed on XSI-compliant POSIX systems.

Now that's annoying.
How about trying file.truncate() within a try block, and if an error is
raised fallback to the zero-filling?

Doing a lot of IO for an object which is supposed to be used for shared
memory is sad.

Or maybe it's time to add an API to access shared memory from Python (since
that's really what we're trying to achieve here).

 According to the documentation, the returned shared array is zeroed.
 In that case because the entire array is written at allocation, the
process is expected to get killed
 if allocating more memory than available. Unless I am misunderstanding
something, which is entirely
 possible.

Having the memory zero-filed doesn't require a write at all: when you do an
anonymous memory mapping for let's say 1Gb, the kernel doesn't
pre-emptively zero-fill it, it would be way to slow: usually it just sets
up the process page table to make this area a COW of a single zero page:
upon read, you'll read zeros, and upon write, it'll duplicate it as needed.

The only reason the code currently zero-fills the file is to avoid the
portability issues detailed by Richard.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-03 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Or maybe it's time to add an API to access shared memory from Python
 (since
 that's really what we're trying to achieve here).

That sounds like a good idea. Especially since we now have the memoryview type.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-02 Thread Charles-François Natali

Charles-François Natali added the comment:

Zero-filling mmap's backing file isn't really optimal: why not use truncate() 
instead? This way, it'll avoid completely I/O on filesystems that support 
sparse files, and should still work on FS that don't.

--
nosy: +neologix

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-02 Thread Médéric Boquien

Médéric Boquien added the comment:

If I remember correctly the problem is that some OS like linux (and probably 
others) do not really allocate space until something is written. If that's the 
case then the process may get killed later on when it writes something in the 
array.

Here is a quick example:

$ truncate -s 1T test.file
$ ls -lh test.file 
-rw-r--r-- 1 mederic users 1.0T Apr  2 23:10 test.file
$ df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdb1   110G   46G   59G  44% /home

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-02 Thread Richard Oudkerk

Richard Oudkerk added the comment:

Using truncate() to zero extend is not really portable: it is only guaranteed 
on XSI-compliant POSIX systems.

Also, the FreeBSD man page for mmap() has the following warning:

WARNING! Extending a file with ftruncate(2), thus creating a big
hole, and then filling the hole by modifying a shared mmap() can
lead to severe file fragmentation.  In order to avoid such
fragmentation you should always pre-allocate the file's backing
store by write()ing zero's into the newly extended area prior to
modifying the area via your mmap().  The fragmentation problem is
especially sensitive to MAP_NOSYNC pages, because pages may be
flushed to disk in a totally random order.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-04-01 Thread Médéric Boquien

Médéric Boquien added the comment:

I have now signed the contributor's agreement.

As for the unit test I was looking at it. However, I was wondering how to write 
a test that would have triggered the problem. It only shows up for very large 
arrays and it depends on occupied memory and the configuration of the temp dir. 
Or should I simply write a test creating for instance a 100 MB array and 
checking it has the right length?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-03-31 Thread Médéric Boquien

New submission from Médéric Boquien:

It is currently impossible to create multiprocessing shared arrays larger than 
50% of memory size under linux (and I assume other unices). A simple test case 
would be the following:

from multiprocessing.sharedctypes import RawArray
import ctypes

foo = RawArray(ctypes.c_double, 10*1024**3//8)  # Allocate 10GB array

If the array is larger than 50% of the total memory size, the process get 
SIGKILL'ed by the OS. Deactivate the swap for better effects.

Naturally this requires that the tmpfs max size is large enough, which is the 
case here, 15GB max with 16GB of RAM.

I have tracked down the problem to multiprocessing/heap.py. The guilty line is: 
f.write(b'\0'*size). Indeed, for very large sizes it is going to create a large 
intermediate array (10 GB in my test case) and as much memory is going to be 
allocated to the new shared array, leading to a memory consumption over the 
limit.

To solve the problem, I have split the zeroing of the shared array into blocks 
of 1MB. I can now allocate arrays as large as the tmpfs maximum size. Also it 
runs a bit faster. On a test case of a 6GB RawArray, 3.4.0 takes a total time 
of 3.930s whereas it goes down to 3.061s with the attached patch.

--
components: Library (Lib)
files: shared_array.diff
keywords: patch
messages: 215258
nosy: mboquien
priority: normal
severity: normal
status: open
title: Failure to create multiprocessing shared arrays larger than 50% of 
memory size under linux
versions: Python 3.4
Added file: http://bugs.python.org/file34685/shared_array.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-03-31 Thread Médéric Boquien

Médéric Boquien added the comment:

Updated the patch not to create a uselessly large array if the size is small 
than the block size.

--
Added file: http://bugs.python.org/file34686/shared_array.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-03-31 Thread Médéric Boquien

Changes by Médéric Boquien mboqu...@free.fr:


Removed file: http://bugs.python.org/file34685/shared_array.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-03-31 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +sbt
stage:  - patch review
type:  - resource usage
versions: +Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-03-31 Thread Médéric Boquien

Médéric Boquien added the comment:

New update of the patch following Antoine Pitrou's comments. PEP8 does not 
complain anymore.

--
Added file: http://bugs.python.org/file34687/shared_array.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21116] Failure to create multiprocessing shared arrays larger than 50% of memory size under linux

2014-03-31 Thread Antoine Pitrou

Antoine Pitrou added the comment:

You overlooked the part where I was suggesting to add a unit test :-)
Also, you'll have to sign a contributor's agreement at 
https://www.python.org/psf/contrib/contrib-form/

Thanks!

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21116
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com