[issue30919] Shared Array Memory Allocation Regression

2017-07-23 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:


New changeset 3051f0b78e53d1b771b49375dc139ca13f9fd76e by Antoine Pitrou in 
branch 'master':
bpo-30919: shared memory allocation performance regression in multiprocessing 
(#2708)
https://github.com/python/cpython/commit/3051f0b78e53d1b771b49375dc139ca13f9fd76e


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Serhiy, I've changed the strategy in PR 2708: now the partition free space is 
tested before deciding to locate the file there or not.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Again, setting TMPDIR is a poor workaround as it will impact any library 
creating temporary files.

I haven't found anyone complaining about limited shared array size on 2.7, so 
likely it's not a common concern.  Right now, anyone creating a huge shared 
array on 3.x will be facing huge performance issues, as the array is backed by 
disk, so it doesn't sound realistic at all to assume people are doing that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This can break Python 3 applications that work with shared arrays larger than 
the size of physical memory. If do this change, it should be optional.

Setting TMPDIR=/dev/shm restores performance of 2.7 version even on versions 
that we will decide to not patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

The main issue here is to restore performance of 2.7 version.  Setting TMPDIR 
sounds too general and might have impact on other libraries.  I think it's ok 
if shared array size is limited by virtual memory size on the computer, since 
the whole point is to expose a fast communication channel between processes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Yes, /dev/shm is ok on my computer.

But there is yet one drawback of this patch. Currently the size of shared array 
is limited by the free space on the file system for temporary files. If it 
isn't enough you can easy set TMPDIR to other path. In 2.7 and with hardcoded 
/dev/shm it is limited by the size of physical memory (maybe plus swap). It is 
not easy to increase it, especially for unprivileged user.

The effect of hardcoding /dev/shm can be achieved by setting TMPDIR to 
/dev/shm. Using os.ftruncate() adds yet about 20% of speed up. This looks 
mainly as a documentation issue to me. The regression and the workaround should 
be documented.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> The size of file system mounted on "/run/user/${uid}" is limited (only 200 
> MiB on my computer)

Hmm, you're right.  Is /dev/shm ok on your computer?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The size of file system mounted on "/run/user/${uid}" is limited (only 200 MiB 
on my computer). get_temp_dir() will be successful, but this may be not enough 
for allocating large shared array. And it is harder to control by user, because 
the location is not specified by an environment variable.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Gareth, Dimitar, you may want to take a look at 
https://github.com/python/cpython/pull/2708

--
stage: needs patch -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
pull_requests: +2774

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Antoine Pitrou

Antoine Pitrou added the comment:

The original changeset in Richard's repository is 
https://hg.python.org/sandbox/sbt/rev/6c8554a7d068.  Unless Richard answers 
otherwise, I think it's likely the performance degradation was an oversight.

Given the code we're talking about is POSIX-specific, truncate() (or its 
sibling os.ftruncate()) is specified to always zero-fill.  So it should be safe 
to use.

Another related point is that the arena file is created inside get_temp_dir(), 
which allocates something in /tmp.  On modern Linux systems at least, /tmp is 
usually backed by persistent storage.  Instead, we could use 
/run/user/{os.getuid()}, which uses a tmpfs and would therefore save on I/O 
when later accessing the shared array.

--
nosy: +pitrou
stage:  -> needs patch
versions: +Python 3.7 -Python 2.7, Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees

Gareth Rees added the comment:

I propose:

1. Ask Richard Oudkerk why in changeset 3b82e0d83bf9 the temporary file is 
zero-filled and not truncated. Perhaps there's some file system where this is 
necessary? (I tested HFS+ which doesn't support sparse files, and zero-filling 
seems not to be necessary, but maybe there's some other file system where it 
is?)

2. If there's no good reason for zero-filling the temporary file, replace it 
with a call to os.ftruncate(fd, size).

3. Update the documentation to mention the performance issue when porting 
multiprocessing code from 2 to 3. Unfortunately, I don't think there's any 
advice that the documentation can give that will help work around it -- 
monkey-patching works but is not supported.

4. Consider writing a fix, or at least a supported workaround. Here's a 
suggestion: update multiprocessing.sharedctypes and multiprocessing.heap so 
that they use anonymous maps in the 'fork' context. The idea is to update the 
RawArray and RawValue functions so that they take the context, and then pass 
the context down to _new_value, BufferWrapper.__init__ and thence to 
Heap.malloc where it can be used to determine what kind of Arena (file-backed 
or anonymous) should be used to satisfy the allocation request. The Heap class 
would have to have to segregate its blocks according to what kind of Arena they 
come from.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Dimitar Tasev

Dimitar Tasev added the comment:

Thank you, that is indeed the solution I ended up with, along with a large 
explanation of why it was necessary. 

Do you think that it's worth updating the `multiprocessing` documentation to 
make users aware of that behaviour?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees

Gareth Rees added the comment:

I see now that the default start method is 'fork' (except on Windows), so 
calling set_start_method is unnecessary.

Note that you don't have to edit multiprocessing/heap.py, you can 
"monkey-patch" it in the program that needs the anonymous mapping:

from multiprocessing.heap import Arena

def anonymous_arena_init(self, size, fd=-1):
"Create Arena using an anonymous memory mapping."
self.size = size
self.fd = fd  # still kept but is not used !
self.buffer = mmap.mmap(-1, self.size)

Arena.__init__ = anonymous_arena_init

As for what it will break — any code that uses the 'spawn' or 'forkserver' 
start methods.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Dimitar Tasev

Dimitar Tasev added the comment:

I have looked into your advice of changing multiprocessing.heap.Arena.__init__, 
I have removed the code that allocated the file and reverted to the old 
behaviour.

I have done some brief runs and it seems to bring back the old behaviour which 
is allocating the space in RAM, rather than with IO. I am not sure what things 
this might break, and it might make the other usages of multiprocessing 
unstable! 

Can anyone think of anything this change might break? The Arena.__init__ code 
is the one from Python 2.7:

class Arena(object):
def __init__(self, size, fd=-1):
 self.size = size
 self.fd = fd  # still kept but is not used !
 self.buffer = mmap.mmap(-1, self.size)

There does not seem to be a difference regardless of the start method setting 
multiprocessing.set_start_method('fork') to be 'fork'.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees

Gareth Rees added the comment:

Nonetheless this is bound to be a nasty performance for many people doing big 
data processing with NumPy/SciPy/Pandas and multiprocessing and moving from 2 
to 3, so even if it can't be fixed, the documentation ought to warn about the 
problem and explain how to work around it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees

Gareth Rees added the comment:

If you need the 2.7 behaviour (anonymous mappings) in 3.5 then you can still do 
it, with some effort. I think the approach that requires the smallest amount of 
work would be to ensure that subprocesses are started using fork(), by calling 
multiprocessing.set_start_method('fork'), and then monkey-patch 
multiprocessing.heap.Arena.__init__ so that it creates anonymous mappings using 
mmap.mmap(-1, size).

(I suggested above that Python could be modified to create anonymous mappings 
in the 'fork' case, but now that I look at the code in detail, I see that it 
would be tricky, because the Arena class has no idea about the Context in which 
it is going to be used -- at the moment you can create one shared object and 
then pass it to subprocesses under different Contexts, so the shared objects 
have to support the lowest common denominator.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Dimitar Tasev

Dimitar Tasev added the comment:

If I understand correctly, there is no way to force the old behaviour in Python 
3.5, i.e. to use an anonymous memory mapping in multiprocessing.heap.Arena so 
that the memory can be shared between the processes instead of writing to a 
shared file?

The data sizes usually are, on average, 20 GB, and writing it out to a file is 
not desirable. As I understand from Gareth Rees' comment, ftruncate() will 
speed up initialisation, however any processing afterwards would be IO capped.

To shed more light to the processing going on, the data is handled as a 3D 
array, so each process gets a 2D array to operate on, and no information needs 
to be shared between processes.

If the anonymous memory mapping cannot be forced, then the multiprocessing 
module with a shared array becomes unusable for me. Are you aware of any way to 
use the multiprocessing module to run execution in parallel, that somehow 
doesn't use a shared array?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-13 Thread Gareth Rees

Gareth Rees added the comment:

Note that some filesystems (e.g. HFS+) don't support sparse files, so creating 
a large Arena will still be slow on these filesystems even if the file is 
created using ftruncate().

(This could be fixed, for the "fork" start method only, by using anonymous maps 
in that case.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-13 Thread Gareth Rees

Gareth Rees added the comment:

In Python 2.7, multiprocessing.heap.Arena uses an anonymous memory mapping on 
Unix. Anonymous memory mappings can be shared between processes but only via 
fork().

But Python 3 supports other ways of starting subprocesses (see issue 8713 [1]) 
and so an anonymous memory mapping no longer works. So instead a temporary file 
is created, filled with zeros to the given size, and mapped into memory (see 
changeset 3b82e0d83bf9 [2]). It is the zero-filling of the temporary file that 
takes the time, because this forces the operating system to allocate space on 
the disk.

But why not use ftruncate() (instead of write()) to quickly create a file with 
holes? POSIX says [3], "If the file size is increased, the extended area shall 
appear as if it were zero-filled" which would seem to satisfy the requirement.

[1] https://bugs.python.org/issue8713
[2] https://hg.python.org/cpython/rev/3b82e0d83bf9
[3] http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html

--
nosy: +g...@garethrees.org

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

It can be significantly sped up if replace the writing loop with

if size:
os.lseek(self.fd, size, 0)
os.write(self.fd, b'')
os.lseek(self.fd, 0, 0)

or just with

os.truncate(self.fd, size)

(not sure the latter always works).

But instantiating an open Arena will still be slower than in 2.7.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This is because instantiating new Arena is much slower in Python 3.

./python -m timeit -r1 -s 'from multiprocessing.heap import Arena' 
'Arena(2**26)'

Python 3.7: 1 loop, best of 1: 1.18 sec per loop
Python 2.7: 10 loops, best of 1: 9.19 usec per loop

May be the regression was introduced by issue8713.

--
nosy: +davin, sbt, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-13 Thread Dimitar Tasev

New submission from Dimitar Tasev:

Hello, I have noticed a significant performance regression when allocating a 
large shared array in Python 3.x versus Python 2.7. The affected module seems 
to be `multiprocessing`.

The function I used for benchmarking:

from timeit import timeit
timeit('sharedctypes.Array(ctypes.c_float, 500*2048*2048)', 'from 
multiprocessing import sharedctypes; import ctypes', number=1)

And the results from executing it:

Python 3.5.2
Out[2]: 182.68500420999771

---

Python 2.7.12
Out[6]: 2.124835968017578

I will try to provide any information you need. Right now I am looking at 
callgrind/cachegrind without Debug symbols, and can post that, in the meantime 
I am building Python with Debug and will re-run the callgrind/cachegrind.

Allocating the same-size array with numpy doesn't seem to have a difference 
between Python versions. The numpy command used was 
`numpy.full((500,2048,2048), 5.0)`. Allocating the same number of list members 
also doesn't have a difference - `arr = [5.0]*(500*2048*2048)`

--
files: shared_array_alloc.py
messages: 298285
nosy: dtasev
priority: normal
severity: normal
status: open
title: Shared Array Memory Allocation Regression
type: performance
versions: Python 2.7, Python 3.5, Python 3.6
Added file: http://bugs.python.org/file47009/shared_array_alloc.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com