[issue9942] Allow memory sections to be OS MERGEABLE
Changes by s7v7nislands s7v7nisla...@gmail.com: -- nosy: +s7v7nislands ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Martin v. Löwis mar...@v.loewis.de added the comment: In order to arrive at some resolution of this issue, I'm answering the original question (Should Python enable a way for folks to inform the OS of MADV_MERGEABLE memory?). The discussion has shown that the answer is no; there are no pages of memory where this would provide any advantage. Closing as won't fix. Anybody reopening it should a) provide a patch with the actual change to be made, and b) accompany it with a benchmark demonstrating some gain. -- nosy: +loewis resolution: - wont fix status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Konstantin Svist fry@gmail.com added the comment: This issue sounds very interesting to me for a somewhat different reason. My problem is that I'm trying to run multiple processes on separate CPUs/cores with os.fork(). In short, the data set is the same (~2GB) and the separate processes do whatever they need, although each fork treats the data set as read-only. Right after the fork, data is shared and fits in RAM nicely, but after a few minutes each child process runs over a bunch of the data set (thereby modifying the ref counters) and the data is copied for each process. RAM usage jumps from 15GB to 30GB and the advantage of a fork is gone. It would be great if there was an option to separate out the ref counters for specific data structures, since it's obviously a bad idea to turn it on by default for everything and everyone. -- nosy: +Fry-kun ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Dave Malcolm dmalc...@redhat.com added the comment: One possible use for this: mark the str buffers of PyUnicodeObject instances when demarshalling docstrings from disk; in theory these ought not to change, and can be quite large: the bulk of the memory overhead is stored in a separate allocation from the object, and thus isn't subjected to the ob_refcnt twiddling. No idea if it's worth it though; the syscall overhead might slow down module import; also, KSM works at the level of 4K pages, and it's not clear that the allocations would line up nicely with pages. FWIW, various related ideas here: http://dmalcolm.livejournal.com/4183.html Again, no idea if these are worthwhile, this was a brainstorm on my blog, and some of the ideas would involve major surgery to CPython to implement. -- nosy: +dmalcolm ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Georg Brandl ge...@python.org added the comment: My first thought is Why is the reference counter stored with the object itself? Because if you move the reference counter out of the object, you a) add another indirection and b) depending on how you implement it require a certain amount of memory more per object. It's far from obvious that the possible benefits are worth this, and needs to be tested carefully, which nobody has done yet. -- nosy: +georg.brandl ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Antoine Pitrou pit...@free.fr added the comment: Answering the question as best I can: I don't know how the reference counter is implemented in CPython, but if it's just a field in a struct, then madvise could be sent the memory location starting with the byte immediately following the reference counter Well, first, this would only work for large objects. Must objects in Python are quite small individually, unless you have very large (unicode or binary) strings, or very big integers. Second, madvise() works at the page granularity (4096 bytes on most system), and it will be very likely this will include the reference count for the current object. Third, MADV_MERGEABLE will only be efficient if you have actual duplications of whole memory pages (and, practically, if you have enough of them to make a real difference). Why do you think you might have such duplication in your workload? -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Kevin Hunter hunt...@earlham.edu added the comment: Well, first, this would only work for large objects. [...] Why do you think you might have such duplication in your workload? Some of the projects with which I work involve multiple manipulations of large datasets. Often, we use Python scripts as first and third stages in a pipeline. For example, in one current workflow, we read a large file into a cStringIO object, do a few manipulations with it, pass it off to a second process, and await the results. Meanwhile, the large file is sitting around in memory because we need to do more manipulations after we get results back from the second application in the pipeline. Graphically: Python Script A-External App-Python Script A read large data process data more manipulations Within a single process, I don't see any gain to be had. However, in this one use-case, this pipeline is running concurrently with a number of copies with slightly different command line parameters. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Antoine Pitrou pit...@free.fr added the comment: Well, first, this would only work for large objects. [...] Why do you think you might have such duplication in your workload? Some of the projects with which I work involve multiple manipulations of large datasets. Often, we use Python scripts as first and third stages in a pipeline. For example, in one current workflow, we read a large file into a cStringIO object, do a few manipulations with it, pass it off to a second process, and await the results. Why do you read it into a cStringIO? A cStringIO has the same interface as a file, so you could simply operate on the file directly. (you could also try mmap if you need quick random access to various portions of the file) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Kevin Hunter hunt...@earlham.edu added the comment: Why do you read it into a cStringIO? A cStringIO has the same interface as a file, so you could simply operate on the file directly. In that particular case, because it isn't actually a file. That workflow was my attempt at simplification to illustrate a point. I think the point is moot however, as I've gotten what I needed from this feature request/discussion. Not one, but three Python developers seem opposed to the idea, or at least skeptical. That's enough to tell me that my first-order supposition that Python objects could be MERGEABLE is not on target. Cheers. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
New submission from Kevin Hunter hunt...@earlham.edu: Should Python enable a way for folks to inform the OS of MADV_MERGEABLE memory? I can't speak for other OSs, but Linux added the ability for processes to inform the kernel that they have memory that will likely not change for a while in 2.6.32. This is done through the madvise syscall with MADV_MERGEABLE. http://www.kernel.org/doc/Documentation/vm/ksm.txt After initial conversations in IRC, it was suggested that this would be difficult in the Python layer, but that the OS doesn't care what byte page it's passed as mergeable. Thus when I, as an application programmer, know that I have some objects that will be around for awhile, and that won't change, I can let the OS know that it might be beneficial to merge them. I suggest this might be a library because it may only be useful for certain projects. -- components: Library (Lib) messages: 117317 nosy: hunteke priority: normal severity: normal status: open title: Allow memory sections to be OS MERGEABLE type: feature request ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: With CPython, even objects that don't change see their reference counter modified quite frequently, just by looking at them. What kind of memory would you mark this way? -- nosy: +amaury.forgeotdarc ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9942] Allow memory sections to be OS MERGEABLE
Kevin Hunter hunt...@earlham.edu added the comment: My first thought is Why is the reference counter stored with the object itself? I imagine there are very good reasons, however, and this is not an area in which I have much mastery. Answering the question as best I can: I don't know how the reference counter is implemented in CPython, but if it's just a field in a struct, then madvise could be sent the memory location starting with the byte immediately following the reference counter. If there's more to it than that, I'll have to back off with I don't know. I'm perhaps embarrassed that I'm not at all a Python developer, merely a Python application developer. I have a few Python projects that are memory hungry, that at first glance I believe to be creating MERGEABLE objects. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9942 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com