On 28 Mar 2017, at 6:11, INADA Naoki wrote:

I managed to install pyopencl and run the script.  It takes more than
2 hours, and uses only 7GB RAM.
Maybe, some faster backend for OpenCL is required?

I used Microsoft Azure Compute, Standard_A4m_v2 (4 cores, 32 GB
memory) instance.

I suppose that the computing power of the Azure instance might not be sufficient and it takes much longer to get to the phase where the memory requirements increase? Have you access to the output that was produced?

By the way, this has nothing to do with OpenCL. OpenCL isn't used by the log_reduction.py script at all. It is listed in the dependencies because some other things use it.

More easy way to reproduce is needed...

Yes, I agree, but it's not super easy (all the smaller existing examples don't exhibit the problem so far), but I'll see what I can do.

My best idea about what's going on at the moment is that memory
fragmentation is worse in Python 3.6 for some reason. The virtual memory size indicates that a large address space is acquired, but the resident memory size is smaller indicating that not all of that address space is actually used. In fact, the code might be especially bad to fragmentation because it takes a lot of small NumPy arrays and concatenates them into larger arrays. But I'm still surprised that this is only a problem with
Python 3.6 (if this hypothesis is correct).

Jan

Generally speaking, VMM vs RSS doesn't mean fragmentation.
If RSS : total allocated memory ratio is bigger than 1.5, it may be
fragmentation.
And large VMM won't cause swap.  Only RSS is meaningful.

I suppose you are right that from the VMM and RSS numbers one cannot deduce fragmentation. But I think RSS in this case might not be meaningful either. My understanding from [the Wikipedia description] is that it doesn't account for parts of the memory that have been written to the swap. Or in other words RSS will never exceed the size of the physical RAM. VSS is also only partially useful because it just gives the size of the address space of which not all might be used?

Anyways, I'm getting a swap usage of about 30GB with Python 3.6 and zsh's time reports 2339977 page faults from disk vs. 107 for Python 3.5.

I have some code to measure the unique set size (USS) and will see what numbers I get with that.

Jan
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to