[issue9942] Allow memory sections to be OS MERGEABLE

2011-05-22 Thread s7v7nislands

Changes by s7v7nislands s7v7nisla...@gmail.com:


--
nosy: +s7v7nislands

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2011-05-22 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

In order to arrive at some resolution of this issue, I'm answering the original 
question (Should Python enable a way for folks to inform the OS of 
MADV_MERGEABLE memory?). The discussion has shown that the answer is no; 
there are no pages of memory where this would provide any advantage.

Closing as won't fix. Anybody reopening it should

a) provide a patch with the actual change to be made, and
b) accompany it with a benchmark demonstrating some gain.

--
nosy: +loewis
resolution:  - wont fix
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-10-27 Thread Konstantin Svist

Konstantin Svist fry@gmail.com added the comment:

This issue sounds very interesting to me for a somewhat different reason.
My problem is that I'm trying to run multiple processes on separate CPUs/cores 
with os.fork(). In short, the data set is the same (~2GB) and the separate 
processes do whatever they need, although each fork treats the data set as 
read-only.
Right after the fork, data is shared and fits in RAM nicely, but after a few 
minutes each child process runs over a bunch of the data set (thereby modifying 
the ref counters) and the data is copied for each process. RAM usage jumps from 
15GB to 30GB and the advantage of a fork is gone.

It would be great if there was an option to separate out the ref counters for 
specific data structures, since it's obviously a bad idea to turn it on by 
default for everything and everyone.

--
nosy: +Fry-kun

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-10-27 Thread Dave Malcolm

Dave Malcolm dmalc...@redhat.com added the comment:

One possible use for this: mark the str buffers of PyUnicodeObject instances 
when demarshalling docstrings from disk; in theory these ought not to change, 
and can be quite large: the bulk of the memory overhead is stored in a separate 
allocation from the object, and thus isn't subjected to the ob_refcnt twiddling.

No idea if it's worth it though; the syscall overhead might slow down module 
import; also, KSM works at the level of 4K pages, and it's not clear that the 
allocations would line up nicely with pages.

FWIW, various related ideas here:
  http://dmalcolm.livejournal.com/4183.html
Again, no idea if these are worthwhile, this was a brainstorm on my blog, and 
some of the ideas would involve major surgery to CPython to implement.

--
nosy: +dmalcolm

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-09-25 Thread Georg Brandl

Georg Brandl ge...@python.org added the comment:

 My first thought is Why is the reference counter stored with the object 
 itself?

Because if you move the reference counter out of the object, you a) add another 
indirection and b) depending on how you implement it require a certain amount 
of memory more per object.

It's far from obvious that the possible benefits are worth this, and needs to 
be tested carefully, which nobody has done yet.

--
nosy: +georg.brandl

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-09-25 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Answering the question as best I can: I don't know how the reference
 counter is implemented in CPython, but if it's just a field in a
 struct, then madvise could be sent the memory location starting with
 the byte immediately following the reference counter

Well, first, this would only work for large objects. Must objects in Python are 
quite small individually, unless you have very large (unicode or binary) 
strings, or very big integers.

Second, madvise() works at the page granularity (4096 bytes on most system), 
and it will be very likely this will include the reference count for the 
current object.

Third, MADV_MERGEABLE will only be efficient if you have actual duplications of 
whole memory pages (and, practically, if you have enough of them to make a real 
difference). Why do you think you might have such duplication in your workload?

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-09-25 Thread Kevin Hunter

Kevin Hunter hunt...@earlham.edu added the comment:

 Well, first, this would only work for large objects. [...]
 Why do you think you might have such duplication in your workload?

Some of the projects with which I work involve multiple manipulations of large 
datasets.  Often, we use Python scripts as first and third stages in a 
pipeline.  For example, in one current workflow, we read a large file into a 
cStringIO object, do a few manipulations with it, pass it off to a second 
process, and await the results.  Meanwhile, the large file is sitting around in 
memory because we need to do more manipulations after we get results back from 
the second application in the pipeline.  Graphically:

Python Script A-External App-Python Script A
read large data  process data  more manipulations

Within a single process, I don't see any gain to be had.  However, in this one 
use-case, this pipeline is running concurrently with a number of copies with 
slightly different command line parameters.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-09-25 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

  Well, first, this would only work for large objects. [...]
  Why do you think you might have such duplication in your workload?
 
 Some of the projects with which I work involve multiple manipulations
 of large datasets.  Often, we use Python scripts as first and third
 stages in a pipeline.  For example, in one current workflow, we read a
 large file into a cStringIO object, do a few manipulations with it,
 pass it off to a second process, and await the results.

Why do you read it into a cStringIO? A cStringIO has the same interface
as a file, so you could simply operate on the file directly.

(you could also try mmap if you need quick random access to various
portions of the file)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-09-25 Thread Kevin Hunter

Kevin Hunter hunt...@earlham.edu added the comment:

 Why do you read it into a cStringIO? A cStringIO has the same interface
 as a file, so you could simply operate on the file directly.

In that particular case, because it isn't actually a file.  That workflow was 
my attempt at simplification to illustrate a point.

I think the point is moot however, as I've gotten what I needed from this 
feature request/discussion.  Not one, but three Python developers seem opposed 
to the idea, or at least skeptical.  That's enough to tell me that my 
first-order supposition that Python objects could be MERGEABLE is not on target.

Cheers.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-09-24 Thread Kevin Hunter

New submission from Kevin Hunter hunt...@earlham.edu:

Should Python enable a way for folks to inform the OS of MADV_MERGEABLE memory?

I can't speak for other OSs, but Linux added the ability for processes to 
inform the kernel that they have memory that will likely not change for a while 
in 2.6.32.  This is done through the madvise syscall with MADV_MERGEABLE.

http://www.kernel.org/doc/Documentation/vm/ksm.txt

After initial conversations in IRC, it was suggested that this would be 
difficult in the Python layer, but that the OS doesn't care what byte page it's 
passed as mergeable.  Thus when I, as an application programmer, know that I 
have some objects that will be around for awhile, and that won't change, I 
can let the OS know that it might be beneficial to merge them.

I suggest this might be a library because it may only be useful for certain 
projects.

--
components: Library (Lib)
messages: 117317
nosy: hunteke
priority: normal
severity: normal
status: open
title: Allow memory sections to be OS MERGEABLE
type: feature request

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-09-24 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

With CPython, even objects that don't change see their reference counter 
modified quite frequently, just by looking at them.
What kind of memory would you mark this way?

--
nosy: +amaury.forgeotdarc

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9942] Allow memory sections to be OS MERGEABLE

2010-09-24 Thread Kevin Hunter

Kevin Hunter hunt...@earlham.edu added the comment:

My first thought is Why is the reference counter stored with the object 
itself?  I imagine there are very good reasons, however, and this is not an 
area in which I have much mastery.

Answering the question as best I can: I don't know how the reference counter is 
implemented in CPython, but if it's just a field in a struct, then madvise 
could be sent the memory location starting with the byte immediately following 
the reference counter.

If there's more to it than that, I'll have to back off with I don't know.  
I'm perhaps embarrassed that I'm not at all a Python developer, merely a Python 
application developer.  I have a few Python projects that are memory hungry, 
that at first glance I believe to be creating MERGEABLE objects.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9942
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com