[issue36867] Make semaphore_tracker track other system resources
Olivier Grisel added the comment: As Victor said, the `time.sleep(1.0)` might lead to Heisen failures. I am not sure how to write proper strong synchronization in this case but we could instead go for something intermediate such as the following pattern: ... p.terminate() p.wait() for i in range(60): try: shared_memory.SharedMemory(name, create=False) except FileNotFoundError: # the OS successfully collected the segment as expected break time.sleep(1.0) # wait for the OS to collect the segment else: raise AssertionError(f"Failed to collect shared_memory segment {name}") What do you think? -- nosy: +Olivier.Grisel ___ Python tracker <https://bugs.python.org/issue36867> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35900] Add pickler hook for the user to customize the serialization of user defined functions and types.
Olivier Grisel added the comment: Adding such a hook would make it possible to reimplement cloudpickle.CloudPickler by deriving from the fast _pickle.Pickler class (instead of the slow pickle._Pickler as done currently). This would mean rewriting most of the CloudPickler method to only rely on a save_reduce-style design instead of directly calling pickle._Pickler.write and pickle._Pickler.save. This is tedious but doable. There is however a blocker with the current way closures are set: when we pickle a dynamically defined function (e.g. lambda, nested function or function in __main__), we currently use a direct call to memoize (https://github.com/cloudpipe/cloudpickle/blob/v0.7.0/cloudpickle/cloudpickle.py#L594) so as to be able to refer to the function itself in its own closure without causing an infinite loop in CloudPickler.dump. This also makes possible to pickle mutually recursive functions. The easiest way to avoid having to call memoize explicitly would be to be able to pass the full __closure__ attribute in the state dict of the reduce call. Indeed the save_reduce function calls memoize automatically after saving the reconstructor and its args but prior to saving the state: https://github.com/python/cpython/blob/v3.7.2/Modules/_pickle.c#L3903-L3931 It would therefore be possible to pass a (state, slotstate) tuple with the closure in slotstate that so it could be reconstructed at unpickling time with a setattr: https://github.com/python/cpython/blob/v3.7.2/Modules/_pickle.c#L6258-L6272 However, it is currently not possible to setattr __closure__ at the moment. We can only set individual closure cell contents (which is not compatible with the setattr state trick described above). To summarize, we need to implement the setter function for the __closure__ attribute of functions and methods to make it natural to reimplement the CloudPickler by inheriting from _pickle.Pickler using the hook described in this issue. -- nosy: +Olivier.Grisel ___ Python tracker <https://bugs.python.org/issue35900> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Thanks for the very helpful feedback and guidance during the review. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Shall we close this issue now that the PR has been merged to master? -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Flushing the buffer at each frame commit will cause a medium-sized write every 64kB on average (instead of one big write at the end). So that might actually cause a performance regression for some users if the individual file-object writes induce significant overhead. In practice though, latency inducing file objects like filesystem-backed ones are likely to derive from the [BufferedWriter](https://docs.python.org/3/library/io.html#io.BufferedWriter) base class and the only latency we should really care about it the one induced by the write call overhead itself in which case the 64kB frame / buffer size should be enough. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: > While we are here, wouldn't be worth to flush the buffer in the C > implementation to the disk always after committing a frame? This will save a > memory when dump a lot of small objects. I think it's a good idea. The C pickler would behave more like the Python pickler. I think framing was intended this way initially. Antoine what do you think? -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Thanks Antoine, I updated my code to what you suggested. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Alright, I found the source of my refcounting bug. I updated the PR to include the C version of the dump for PyBytes. I ran Serhiy's microbenchmarks on the C version and I could not detect any overhead on small bytes objects while I get a ~20x speedup (and no-memory copy) on large bytes objects as expected. I would like to update the `write_utf8` function but I would need to find a way to wrap `const char* data` as a PyBytes instance without making a memory copy to be able to pass it to my `_Pickle_write_large_bytes`. I browsed the C-API documentation but I could not understand how to do that. Also I would appreciate any feedback on the code style or things that could be improved in my PR. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: I have tried to implement the direct write bypass for the C version of the pickler but I get a segfault in a Py_INCREF on obj during the call to memo_put(self, obj) after the call to _Pickler_write_large_bytes. Here is the diff of my current version of the patch: https://github.com/ogrisel/cpython/commit/4e093ad6993616a9f16e863b72bf2d2e37bc27b4 I am new to the Python C-API so I would appreciate some help on this one. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: BTW, I am looking at the C implementation at the moment. I think I can do it. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Alright, the last version has now ~4% overhead for small bytes. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Actually, I think this can still be improved while keeping it readable. Let me try again :) -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: I have pushed a new version of the code that now has a 10% overhead for small bytes (instead of 40% previously). It could be possible to optimize further but I think that would render the code much less readable so I would be tempted to keep it this way. Please let me know what you think. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: In my last comment, I also reported the user times (not spend in OS level disk access stuff): the code of the PR is on the order of 300-400ms while master is around 800ms or more. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: More benchmarks with the unix time command: ``` (py37) ogrisel@ici:~/code/cpython$ git checkout master Switched to branch 'master' Your branch is up-to-date with 'origin/master'. (py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done in 10.677s => peak memory usage: 5.936 GB real0m11.068s user0m0.940s sys 0m5.204s (py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done in 5.089s => peak memory usage: 5.978 GB real0m5.367s user0m0.840s sys 0m4.660s (py37) ogrisel@ici:~/code/cpython$ git checkout issue-31993-pypickle-dump-mem-optim Switched to branch 'issue-31993-pypickle-dump-mem-optim' (py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done in 6.974s => peak memory usage: 2.014 GB real0m7.300s user0m0.368s sys 0m4.640s (py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done in 10.873s => peak memory usage: 2.014 GB real0m11.178s user0m0.324s sys 0m5.100s (py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done in 4.233s => peak memory usage: 2.014 GB real0m4.574s user0m0.396s sys 0m4.368s ``` User time is always better in the PR than on master but is also much slower than system time (disk access) in any case. System time is much less deterministic. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Note that the time difference is not significant. I rerun the last command I got: ``` (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done in 4.187s => peak memory usage: 2.014 GB ``` -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
Olivier Grisel <olivier.gri...@ensta.org> added the comment: I wrote a script to monitor the memory when dumping 2GB of data with python master (C pickler and Python pickler): ``` (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done in 5.141s => peak memory usage: 4.014 GB (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done in 5.046s => peak memory usage: 5.955 GB ``` This is using protocol 4. Note that the C pickler is only making 1 useless memory copy instead of 2 for the Python pickler (one for the concatenation and the other because of the framing mechanism of protocol 4). Here the output with the Python pickler fixed in python/cpython#4353: ``` (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done in 6.138s => peak memory usage: 2.014 GB ``` Basically the 2 spurious memory copies of the Python pickler with protocol 4 are gone. Here is the script: https://gist.github.com/ogrisel/0e7b3282c84ae4a581f3b9ec1d84b45a -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31993] pickle.dump allocates unnecessary temporary bytes / str
New submission from Olivier Grisel <olivier.gri...@ensta.org>: I noticed that both pickle.Pickler (C version) and pickle._Pickler (Python version) make unnecessary memory copies when dumping large str, bytes and bytearray objects. This is caused by unnecessary concatenation of the opcode and size header with the large bytes payload prior to calling self.write. For protocol 4, an additional copy is caused by the framing mechanism. I will submit a pull request to fix the issue for the Python version. I am not sure how to test this properly. The BigmemPickleTests seems to be skipped on my 16 GB laptop. -- components: Library (Lib) messages: 305975 nosy: Olivier.Grisel, pitrou priority: normal severity: normal status: open title: pickle.dump allocates unnecessary temporary bytes / str type: resource usage versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated
Olivier Grisel added the comment: No problem. Thanks Antoine for the review! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21905 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated
Olivier Grisel added the comment: New version of the patch to add an inline comment. -- Added file: http://bugs.python.org/file35841/pickle_whichmodule_20140703.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21905 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated
New submission from Olivier Grisel: `pickle.whichmodule` performs an iteration over `sys.modules` and tries to perform `getattr` calls on those modules. Unfortunately some modules such as those from the `six.moves` dynamic module can trigger imports when calling `getattr` on them, hence mutating the `sys.modules` dict and causing a `RuntimeError: dictionary changed size during iteration`. This would also render `pickle.whichmodule` more thread-safe and cause concurrent thread perform new module imports and `whichmodule` calls. The attach patch protect the iterator by copying the dict items into a fixed list. I could write a tests involving dynamic module definitions as done in `six.moves` but it sounds very complicated for such a trivial fix. -- components: Library (Lib) files: pickle_whichmodule.patch keywords: patch messages: 222099 nosy: Olivier.Grisel priority: normal severity: normal status: open title: RuntimeError in pickle.whichmodule when sys.modules if mutated type: crash versions: Python 3.4, Python 3.5 Added file: http://bugs.python.org/file35830/pickle_whichmodule.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21905 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] Handle a non-importable __main__ in multiprocessing
Olivier Grisel added the comment: I applied issue19946_pep_451_multiprocessing_v2.diff and I confirm that it fixes the problem that I reported initially. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module
Olivier Grisel added the comment: Why has this issue been closed? Won't the spawn and forkserver mode work in Python 3.4 for Python program started by a Python script (which is probably the majority of programs written in Python under unix)? Is there any reason not to use the `imp.load_source` code I put in my patch as a temporary workaround if the cleaner runpy.run_path solution is too tricky to implement for the Python 3.4 release time frame? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module
Olivier Grisel added the comment: The semantics are not going to change in python 3.4 and will just stay as they were in Python 3.3. Well the semantics do change: in Python 3.3 the spawn and forkserver modes did not exist at all. The spawn mode existed but only implicitly and only under Windows. So Python 3.4 is introducing a new feature for POSIX systems that will only work in the rare cases where the Python program is launched by a .py ending script. Would running the `imp.load_source` trick only if `sys.platform != win32` be a viable way to preserve the semantics of Python 3.3 under the windows while not introducing a partially broken feature in Python 3.4? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module
Olivier Grisel added the comment: I can wait (or monkey-patch the stuff I need as a temporary workaround in my code). My worry is that Python 3.4 will introduce a new feature that is very crash-prone. Take this simple program that uses the newly introduced `get_context` function (the same problem happens with `set_start_method`): filename: mytool #!/usr/bin/env python from multiprocessing import freeze_support, get_context def compute_stuff(i): # in real life you could use a lib that uses threads # like cuda and that would crash with the default 'fork' # mode under POSIX return i ** 2 if __name__ == __main__: freeze_support() ctx = get_context('spawn') ctx.Pool(4).map(compute_stuff, range(8)) If you chmod +x this file and run it with ./mytool, the user will get an infinitely running process that keeps displaying on stderr: Traceback (most recent call last): File string, line 1, in module File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 96, in spawn_main exitcode = _main(fd) File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 105, in _main prepare(preparation_data) File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 210, in prepare import_main_path(data['main_path']) File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 256, in import_main_path raise ImportError(name=main_name) ImportError Traceback (most recent call last): File string, line 1, in module File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 96, in spawn_main exitcode = _main(fd) File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 105, in _main prepare(preparation_data) File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 210, in prepare import_main_path(data['main_path']) File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 256, in import_main_path raise ImportError(name=main_name) ImportError ... until the user kills the process. Is there really nothing we can do to avoid releasing Python 3.4 with this bug? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module
Olivier Grisel added the comment: For Python 3.4: Maybe rather than raising ImportError, we could issue warning to notify the users that names from the __main__ namespace could not be loaded and make the init_module_attrs return early. This way a multiprocessing program that only calls functions defined in non-main namespaces could still use the new start method feature introduced in Python 3.4 while not changing the Python 3.3 semantics for windows programs and not relying on any deprecated hack. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module
Olivier Grisel added the comment: I agree that a failure to lookup the module should raise an explicit exception. Second, there is no way that 'nosetests' will ever succeed as an import since, as Oliver pointed out, it doesn't end in '.py' or any other identifiable way for a finder to know it can handle the file. So this is not a bug and simply a side-effect of how import works. The only way around it would be to symlink nosetests to nosetests.py or to somehow pre-emptively set up 'nosetests' for supported importing. I don't agree that (unix) Python programs that don't end with .py should be modified to have multiprocessing work correctly. I think it should be the multiprocessing responsibility to transparently find out how to spawn the new process independently of the fact that the program ends in '.py' or not. Note: the fork mode works always under unix (with or without the .py extension). The spawn mode always work under windows as AFAIK there is no way to have Python programs that don't end in .py under windows and furthermore I think multiprocessing does execute the __main__ under windows (but I haven't tested if it's still the case in Python HEAD). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module
Olivier Grisel added the comment: what is sys.modules['__main__'] and sys.modules['__main__'].__file__ if you run under nose? $ cat check_stuff.py import sys def test_main(): print(sys.modules['__main__']=%r % sys.modules['__main__']) print(sys.modules['__main__'].__file__=%r % sys.modules['__main__'].__file__) if __name__ == '__main__': test_main() (pyhead) ogrisel@is146148:~/tmp$ python check_stuff.py sys.modules['__main__']=module '__main__' from 'check_stuff.py' sys.modules['__main__'].__file__='check_stuff.py' (pyhead) ogrisel@is146148:~/tmp$ nosetests -s check_stuff.py sys.modules['__main__']=module '__main__' from '/volatile/ogrisel/envs/pyhead/bin/nosetests' sys.modules['__main__'].__file__='/volatile/ogrisel/envs/pyhead/bin/nosetests' . -- Ran 1 test in 0.001s OK -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module
Olivier Grisel added the comment: Note however that the problem is not specific to nose. If I rename my initial 'check_forserver.py' script to 'check_forserver', add the '#!/usr/bin/env python' header and make it 'chmod +x' I get the same crash. So the problem is related to the fact that under posix, valid Python programs can be executable scripts without the '.py' extension. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module
Olivier Grisel added the comment: Here is a patch that uses `imp.load_source` when the first importlib name-based lookup fails. Apparently it fixes the issue on my box but I am not sure whether this is the correct way to do it. -- keywords: +patch Added file: http://bugs.python.org/file33091/issue19946.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] multiprocessing crash with forkserver or spawn when run from a non .py ending script
New submission from Olivier Grisel: Here is a simple python program that uses the new forkserver feature introduced in 3.4b1: name: checkforkserver.py import multiprocessing import os def do(i): print(i, os.getpid()) def test_forkserver(): mp = multiprocessing.get_context('forkserver') mp.Pool(2).map(do, range(3)) if __name__ == __main__: test_forkserver() When running this using the python check_forkserver.py command everything works as expected. When running this using the nosetests launcher (nosetests -s check_forkserver.py), I get: Traceback (most recent call last): File string, line 1, in module File /opt/Python-HEAD/lib/python3.4/multiprocessing/forkserver.py, line 141, in main spawn.import_main_path(main_path) File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 252, in import_main_path methods.init_module_attrs(main_module) File frozen importlib._bootstrap, line 1051, in init_module_attrs AttributeError: 'NoneType' object has no attribute 'loader' Indeed, the spec variable in multiprocessing/spawn.py's import_main_path function is None as the nosetests script is not a regular python module: in particular is does not have a .py extension. If I copy or symlink or renamed the nosetests script as nosetests.py in the same folder, this works as expected. I am not familiar enough with the importlib machinery to suggest a fix for this bug. Also there is a typo in the comment: causing a psuedo fork bomb = causing a pseudo fork bomb. Note: I am running CPython head updated today. -- components: Library (Lib) messages: 205810 nosy: Olivier.Grisel priority: normal severity: normal status: open title: multiprocessing crash with forkserver or spawn when run from a non .py ending script versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] multiprocessing crash with forkserver or spawn when run from a non .py ending script
Changes by Olivier Grisel olivier.gri...@ensta.org: -- type: - crash ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19946] multiprocessing crash with forkserver or spawn when run from a non .py ending script
Olivier Grisel added the comment: So the question is exactly what module is being passed to importlib.find_spec() and why isn't it finding a spec/loader for that module. The module is the `nosetests` python script. module_name == 'nosetests' in this case. However, nosetests is not considered an importable module because of the missing '.py' extension in the filename. Did this code work in Python 3.3? This code did not exist in Python 3.3. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19851] reload problem with submodule
Olivier Grisel added the comment: I tested the patch on the current HEAD and it fixes a regression introduced between 3.3 and 3.4b1 that prevented to build scipy from source with pip install scipy. -- nosy: +Olivier.Grisel ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19851 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18999] Robustness issues in multiprocessing.{get, set}_start_method
Olivier Grisel added the comment: The process pool executor [1] from the concurrent futures API would be suitable to explicitly start and stop the helper process for the `forkserver` mode. [1] http://docs.python.org/3.4/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor The point would be to have as few state as possible encoded in the multiprocessing module (and its singletons) and move that state information to be directly managed by multiprocessing Process and Pool class instances so that libraries could customize the behavior (start_method, executable, ForkingPIckler reducers registry and so on) without mutating the state of the multiprocessing module singletons. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18999 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18999] Robustness issues in multiprocessing.{get, set}_start_method
Olivier Grisel added the comment: Richard Oudkerk: thanks for the clarification, that makes sense. I don't have the time either in the coming month, maybe later. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18999 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18999] Robustness issues in multiprocessing.{get, set}_start_method
Olivier Grisel added the comment: Maybe it would be better to have separate contexts for each start method. That way joblib could use the forkserver context without interfering with the rest of the user's program. Yes in general it would be great if libraries could customize the multiprocessing default options without impacting any of the module singletons. That also include the ForkingPickler registry for custom: now it's a class attribute. It would be great to be able to scope custom reducers registration to a given multiprocessing.Pool or multiprocessing.Process instance. Now how to implement that kind of isolation: it could either be done by adding new constructor parameters or new public methods to the Process and Pool classes to be able to customize their behavior while sticking to the OOP paradigm if possible or by using a context manager as you suggested. I am not sure which option is best. Prototyping both is probably the best way to feel the tradeoffs. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18999 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18999] Robustness issues in multiprocessing.{get, set}_start_method
Olivier Grisel added the comment: Related question: is there any good reason that would prevent to pass a custom `start_method` kwarg to the `Pool` constructor to make it use an alternative `Popen` instance (that is an instance different from the `multiprocessing._Popen` singleton)? This would allow libraries such as joblib to keep minimal side effect by try to impact the default multiprocessing runtime as low as possible. -- nosy: +Olivier.Grisel ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18999 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17560] problem using multiprocessing with really big objects?
Olivier Grisel added the comment: I have implemented a custom subclass of the multiprocessing Pool to be able plug custom pickling strategy for this specific use case in joblib: https://github.com/joblib/joblib/blob/master/joblib/pool.py#L327 In particular it can: - detect mmap-backed numpy - transform large memory backed numpy arrays into numpy.memmap instances prior to pickling using the /dev/shm partition when available or TMPDIR otherwise. Here is some doc: https://github.com/joblib/joblib/blob/master/doc/parallel_numpy.rst I could submit the part that makes it possible to customize the picklers of multiprocessing.pool.Pool instance to the standard library if people are interested. The numpy specific stuff would stay in third party projects such as joblib but at least that would make it easier for people to plug their own optimizations without having to override half of the multiprocessing class hierarchy. -- nosy: +Olivier.Grisel ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17560 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17560] problem using multiprocessing with really big objects?
Olivier Grisel added the comment: I forgot to end a sentence in my last comment: - detect mmap-backed numpy should read: - detect mmap-backed numpy arrays and pickle only the filename and other buffer metadata to reconstruct a mmap-backed array in the worker processes instead of copying the data around. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17560 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17560] problem using multiprocessing with really big objects?
Olivier Grisel added the comment: In 3.3 you can do from multiprocessing.forking import ForkingPickler ForkingPickler.register(MyType, reduce_MyType) Is this sufficient for you needs? This is private (and its definition has moved in 3.4) but it could be made public. Indeed I forgot that the multiprocessing pickler was made already made pluggable in Python 3.3. I needed backward compat for python 2.6 in joblib hence I had to rewrite a bunch of the class hierarchy. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17560 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: segfault calling SSE enabled library from ctypes
I found an even simpler solution: use the -mstackrealign GCC option to build the shared library. -- Olivier -- http://mail.python.org/mailman/listinfo/python-list
Re: SWIG vs. ctypes (Was: ANN: PyEnchant 1.5.0)
2008/11/25 [EMAIL PROTECTED]: On Nov 25, 4:34 pm, Diez B. Roggisch [EMAIL PROTECTED] wrote: You can't use ctypes for C++, only for C-style APIs. Diez With some work, you can convert your C++ objects to PyObject* and then return the latter in a function with C bindings. http://cython.org also has some support for C++ bindings. I also heard that Boost.Python is very well suited to make full featured C++ / python bindings though I never tried myself. -- Olivier -- http://mail.python.org/mailman/listinfo/python-list
Re: segfault calling SSE enabled library from ctypes
Replying to myself: haypo found the origin of the problem. Apparently this problem stems from a GCC bug [1] (that should be fixed on x86 as of version 4.4). The bug is that GCC does not always ensure the stack to be 16 bytes aligned hence the __m128 myvector local variable in the previous code might not be aligned. A workaround would be to align the stack before calling the inner function as done here: http://www.bitbucket.org/ogrisel/ctypes_sse/changeset/dc27626824b8/ New version of the previous C code: quote #include stdio.h #include emmintrin.h void wrapped_dummy_sse() { // allocate an alligned vector of 128 bits __m128 myvector; printf([dummy_sse] before calling setzero\n); fflush(stdout); // initialize it to 4 32 bits float valued to zeros myvector = _mm_setzero_ps(); printf([dummysse] after calling setzero\n); fflush(stdout); // display the content of the vector float* part = (float*) myvector; printf([dummysse] myvector = {%f, %f, %f, %f}\n, part[0], part[1], part[2], part[3]); } void dummy_sse(void) { (void)__builtin_return_address(1); // to force call frame asm volatile (andl $-16, %%esp ::: %esp); wrapped_dummy_sse(); } int main() { dummy_sse(); return 0; } /quote [1] see e.g. for a nice summary of the issue http://www.mail-archive.com/gcc%40gcc.gnu.org/msg33101.html Another workaround would be to allocate myvector in the heap using malloc / posix_memalign for instance. Best, -- Olivie -- http://mail.python.org/mailman/listinfo/python-list
segfault calling SSE enabled library from ctypes
Hello, It seems that I am able to reproduce the same problem as reported earlier on this list by someone else: http://mail.python.org/pipermail/python-list/2008-October/511794.html Similar setup: python 2.5.2 / gcc (Ubuntu 4.3.2-1ubuntu11) from Intrepid on 32bit intel Core 2 Duo. I can confirm this is not related to any alignment problem from data passed from python, I took care of that and I could even reproduce the problem with the following minimal test case that does not use any external data (you can fetch the following source here http://www.bitbucket.org/ogrisel/ctypes_sse/get/tip.gz ) sample === dummysse.c === #include stdio.h #include emmintrin.h void dummy_sse(void) { // allocate an alligned vector of 128 bits __m128 myvector; printf([dummy_sse] before calling setzero\n); fflush(stdout); // initialize it to 4 32 bits float valued to zeros myvector = _mm_setzero_ps(); printf([dummysse] after calling setzero\n); fflush(stdout); // display the content of the vector float* part = (float*) myvector; printf([dummysse] myvector = {%f, %f, %f, %f}\n, part[0], part[1], part[2], part[3]); } int main() { dummy_sse(); return 0; } === dummysse.py === from ctypes import cdll lib = cdll.LoadLibrary('./dummysse.so') lib.dummy_sse() === Makefile === CC = gcc CFLAGS = -Wall -g -O0 -msse2 all: dummysse dummysse.so dummysse: $(CC) $(CFLAGS) $(LIBS) -o dummysse dummysse.c # ./dummysse dummysse.so: $(CC) $(CFLAGS) $(LIBS) -shared -o dummysse.so dummysse.c # python dummysse.py clean: rm -f dummysse dummysse.so /sample By running the main of the C program I get the expected behavior: gcc -Wall -g -O0 -msse2 -o dummysse dummysse.c ./dummysse [dummy_sse] before calling setzero [dummysse] after calling setzero [dummysse] myvector = {0.00, 0.00, 0.00, 0.00} Running from python, the call to the _mm_setzero_ps() segfaults: gcc -Wall -g -O0 -msse2 -shared -o dummysse.so dummysse.c python dummysse.py [dummy_sse] before calling setzero Segmentation fault Is this to be expected? The result to a call to valgrind python dummysse.py is available here : http://www.bitbucket.org/ogrisel/ctypes_sse/src/tip/valgrind.log I am not familiar with python internal at all so I cannot understand what's wrong. You can notice that valgrind make the program run till the end and display the correct results (4 zeros) on stdout while logging a bunch of errors (most of those are not related to our problem since they appear when launching python on an empty script). -- Olivier -- http://mail.python.org/mailman/listinfo/python-list
Re: How to calculate the CPU time consumption and memory consuption of any python program in Linux
gene tani a écrit : Shahriar Shamil Uulu wrote: Thank you, for your directions and advices. shahriar ... also look: http://spyced.blogspot.com/2005/09/how-well-do-you-know-python-part-9.html whihc mentions twisted.python.reflect.findInstances(sys.modules, str) and objgrep, which i didn't know about This looks relevant too (not tested though): http://pysizer.8325.org/ -- Olivier -- http://mail.python.org/mailman/listinfo/python-list
Re: Vaults of Parnassus hasn't been updated for months
Wolfgang Grafen a écrit : What happened to the Vaults of Parnassus? It was always my favourite resource for Python code since ever. The latest entry is now 8/23. It has been up to date for years but now... What a pity! Everybody is using the cheeseshop now: http://cheeseshop.python.org/pypi?%3Aaction=browse -- Olivier -- http://mail.python.org/mailman/listinfo/python-list