[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Olivier Grisel


Olivier Grisel  added the comment:

As Victor said, the `time.sleep(1.0)` might lead to Heisen failures. I am not 
sure how to write proper strong synchronization in this case but we could 
instead go for something intermediate such as the following pattern:


...
p.terminate()
p.wait()
for i in range(60):
try:
shared_memory.SharedMemory(name, create=False)
except FileNotFoundError:
# the OS successfully collected the segment as expected
break

time.sleep(1.0)  # wait for the OS to collect the segment

 else:
 raise AssertionError(f"Failed to collect shared_memory segment 
{name}")


What do you think?

--
nosy: +Olivier.Grisel

___
Python tracker 
<https://bugs.python.org/issue36867>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35900] Add pickler hook for the user to customize the serialization of user defined functions and types.

2019-02-13 Thread Olivier Grisel


Olivier Grisel  added the comment:

Adding such a hook would make it possible to reimplement 
cloudpickle.CloudPickler by deriving from the fast _pickle.Pickler class 
(instead of the slow pickle._Pickler as done currently). This would mean 
rewriting most of the CloudPickler method to only rely on a save_reduce-style 
design instead of directly calling pickle._Pickler.write and 
pickle._Pickler.save. This is tedious but doable.

There is however a blocker with the current way closures are set: when we 
pickle a dynamically defined function (e.g. lambda, nested function or function 
in __main__), we currently use a direct call to memoize 
(https://github.com/cloudpipe/cloudpickle/blob/v0.7.0/cloudpickle/cloudpickle.py#L594)
 so as to be able to refer to the function itself in its own closure without 
causing an infinite loop in CloudPickler.dump. This also makes possible to 
pickle mutually recursive functions.

The easiest way to avoid having to call memoize explicitly would be to be able 
to pass the full __closure__ attribute in the state dict of the reduce call. 
Indeed the save_reduce function calls memoize automatically after saving the 
reconstructor and its args but prior to saving the state:

https://github.com/python/cpython/blob/v3.7.2/Modules/_pickle.c#L3903-L3931

It would therefore be possible to pass a (state, slotstate) tuple with the 
closure in slotstate that so it could be reconstructed at unpickling time with 
a setattr:

https://github.com/python/cpython/blob/v3.7.2/Modules/_pickle.c#L6258-L6272

However, it is currently not possible to setattr __closure__ at the moment. We 
can only set individual closure cell contents (which is not compatible with the 
setattr state trick described above).

To summarize, we need to implement the setter function for the __closure__ 
attribute of functions and methods to make it natural to reimplement the 
CloudPickler by inheriting from _pickle.Pickler using the hook described in 
this issue.

--
nosy: +Olivier.Grisel

___
Python tracker 
<https://bugs.python.org/issue35900>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

Thanks for the very helpful feedback and guidance during the review.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

Shall we close this issue now that the PR has been merged to master?

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

Flushing the buffer at each frame commit will cause a medium-sized write every 
64kB on average (instead of one big write at the end). So that might actually 
cause a performance regression for some users if the individual file-object 
writes induce significant overhead.

In practice though, latency inducing file objects like filesystem-backed ones 
are likely to derive from the 
[BufferedWriter](https://docs.python.org/3/library/io.html#io.BufferedWriter) 
base class and the only latency we should really care about it the one induced 
by the write call overhead itself in which case the 64kB frame / buffer size 
should be enough.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

> While we are here, wouldn't be worth to flush the buffer in the C 
> implementation to the disk always after committing a frame? This will save a 
> memory when dump a lot of small objects.

I think it's a good idea. The C pickler would behave more like the Python 
pickler. I think framing was intended this way initially. Antoine what do you 
think?

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

Thanks Antoine, I updated my code to what you suggested.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-11 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

Alright, I found the source of my refcounting bug. I updated the PR to include 
the C version of the dump for PyBytes.

I ran Serhiy's microbenchmarks on the C version and I could not detect any 
overhead on small bytes objects while I get a ~20x speedup (and no-memory copy) 
on large bytes objects as expected.

I would like to update the `write_utf8` function but I would need to find a way 
to wrap `const char* data` as a PyBytes instance without making a memory copy 
to be able to pass it to my `_Pickle_write_large_bytes`. I browsed the C-API 
documentation but I could not understand how to do that.

Also I would appreciate any feedback on the code style or things that could be 
improved in my PR.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

I have tried to implement the direct write bypass for the C version of the 
pickler but I get a segfault in a Py_INCREF on obj during the call to  
memo_put(self, obj) after the call to _Pickler_write_large_bytes.

Here is the diff of my current version of the patch:

https://github.com/ogrisel/cpython/commit/4e093ad6993616a9f16e863b72bf2d2e37bc27b4

I am new to the Python C-API so I would appreciate some help on this one.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

BTW, I am looking at the C implementation at the moment. I think I can do it.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

Alright, the last version has now ~4% overhead for small bytes.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

Actually, I think this can still be improved while keeping it readable. Let me 
try again :)

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

I have pushed a new version of the code that now has a 10% overhead for small 
bytes (instead of 40% previously).

It could be possible to optimize further but I think that would render the code 
much less readable so I would be tempted to keep it this way.

Please let me know what you think.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

In my last comment, I also reported the user times (not spend in OS level disk 
access stuff): the code of the PR is on the order of 300-400ms while master is 
around 800ms or more.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

More benchmarks with the unix time command:

```

(py37) ogrisel@ici:~/code/cpython$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
(py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py 
--use-pypickle
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done in 10.677s
=> peak memory usage: 5.936 GB

real0m11.068s
user0m0.940s
sys 0m5.204s
(py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py 
--use-pypickle
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done in 5.089s
=> peak memory usage: 5.978 GB

real0m5.367s
user0m0.840s
sys 0m4.660s
(py37) ogrisel@ici:~/code/cpython$ git checkout 
issue-31993-pypickle-dump-mem-optim 
Switched to branch 'issue-31993-pypickle-dump-mem-optim'
(py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py 
--use-pypickle
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done in 6.974s
=> peak memory usage: 2.014 GB

real0m7.300s
user0m0.368s
sys 0m4.640s
(py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py 
--use-pypickle
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done in 10.873s
=> peak memory usage: 2.014 GB

real0m11.178s
user0m0.324s
sys 0m5.100s
(py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_pickle_dump.py 
--use-pypickle
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done in 4.233s
=> peak memory usage: 2.014 GB

real0m4.574s
user0m0.396s
sys 0m4.368s
```

User time is always better in the PR than on master but is also much slower 
than system time (disk access) in any case. System time is much less 
deterministic.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

Note that the time difference is not significant. I rerun the last command I 
got:

```
(py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py 
--use-pypickle
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done in 4.187s
=> peak memory usage: 2.014 GB
```

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel

Olivier Grisel <olivier.gri...@ensta.org> added the comment:

I wrote a script to monitor the memory when dumping 2GB of data with python 
master (C pickler and Python pickler):

```
(py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done in 5.141s
=> peak memory usage: 4.014 GB
(py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py 
--use-pypickle
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done in 5.046s
=> peak memory usage: 5.955 GB
```

This is using protocol 4. Note that the C pickler is only making 1 useless 
memory copy instead of 2 for the Python pickler (one for the concatenation and 
the other because of the framing mechanism of protocol 4).

Here the output with the Python pickler fixed in python/cpython#4353:

```
(py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py 
--use-pypickle
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done in 6.138s
=> peak memory usage: 2.014 GB
```


Basically the 2 spurious memory copies of the Python pickler with protocol 4 
are gone.

Here is the script: 
https://gist.github.com/ogrisel/0e7b3282c84ae4a581f3b9ec1d84b45a

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel

New submission from Olivier Grisel <olivier.gri...@ensta.org>:

I noticed that both pickle.Pickler (C version) and pickle._Pickler (Python 
version) make unnecessary memory copies when dumping large str, bytes and 
bytearray objects.

This is caused by unnecessary concatenation of the opcode and size header with 
the large bytes payload prior to calling self.write.

For protocol 4, an additional copy is caused by the framing mechanism.

I will submit a pull request to fix the issue for the Python version. I am not 
sure how to test this properly. The BigmemPickleTests seems to be skipped on my 
16 GB laptop.

--
components: Library (Lib)
messages: 305975
nosy: Olivier.Grisel, pitrou
priority: normal
severity: normal
status: open
title: pickle.dump allocates unnecessary temporary bytes / str
type: resource usage
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31993>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated

2014-10-06 Thread Olivier Grisel

Olivier Grisel added the comment:

No problem. Thanks Antoine for the review!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21905
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated

2014-07-03 Thread Olivier Grisel

Olivier Grisel added the comment:

New version of the patch to add an inline comment.

--
Added file: http://bugs.python.org/file35841/pickle_whichmodule_20140703.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21905
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated

2014-07-02 Thread Olivier Grisel

New submission from Olivier Grisel:

`pickle.whichmodule` performs an iteration over `sys.modules` and tries to 
perform `getattr` calls on those modules. Unfortunately some modules such as 
those from the `six.moves` dynamic module can trigger imports when calling 
`getattr` on them, hence mutating the `sys.modules` dict and causing a 
`RuntimeError: dictionary changed size during iteration`.

This would also render `pickle.whichmodule` more thread-safe and cause 
concurrent thread perform new module imports and `whichmodule` calls.

The attach patch protect the iterator by copying the dict items into a fixed 
list.

I could write a tests involving dynamic module definitions as done in 
`six.moves` but it sounds very complicated for such a trivial fix.

--
components: Library (Lib)
files: pickle_whichmodule.patch
keywords: patch
messages: 222099
nosy: Olivier.Grisel
priority: normal
severity: normal
status: open
title: RuntimeError in pickle.whichmodule  when sys.modules if mutated
type: crash
versions: Python 3.4, Python 3.5
Added file: http://bugs.python.org/file35830/pickle_whichmodule.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21905
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] Handle a non-importable __main__ in multiprocessing

2013-12-16 Thread Olivier Grisel

Olivier Grisel added the comment:

I applied issue19946_pep_451_multiprocessing_v2.diff and I confirm that it 
fixes the problem that I reported initially.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-13 Thread Olivier Grisel

Olivier Grisel added the comment:

Why has this issue been closed? Won't the spawn and forkserver mode work in 
Python 3.4 for Python program started by a Python script (which is probably the 
majority of programs written in Python under unix)?

Is there any reason not to use the `imp.load_source` code I put in my patch as 
a temporary workaround if the cleaner runpy.run_path solution is too tricky to 
implement for the Python 3.4 release time frame?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-13 Thread Olivier Grisel

Olivier Grisel added the comment:

 The semantics are not going to change in python 3.4 and will just stay as 
 they were in Python 3.3.

Well the semantics do change: in Python 3.3 the spawn and forkserver modes did 
not exist at all. The spawn mode existed but only implicitly and only under 
Windows.

So Python 3.4 is introducing a new feature for POSIX systems that will only 
work in the rare cases where the Python program is launched by a .py ending 
script.

Would running the `imp.load_source` trick only if `sys.platform != win32` be 
a viable way to preserve the semantics of Python 3.3 under the windows while 
not introducing a partially broken feature in Python 3.4?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-13 Thread Olivier Grisel

Olivier Grisel added the comment:

I can wait (or monkey-patch the stuff I need as a temporary workaround in my 
code). My worry is that Python 3.4 will introduce a new feature that is very 
crash-prone.

Take this simple program that uses the newly introduced `get_context` function 
(the same problem happens with `set_start_method`):

filename: mytool

#!/usr/bin/env python
from multiprocessing import freeze_support, get_context


def compute_stuff(i):
# in real life you could use a lib that uses threads
# like cuda and that would crash with the default 'fork'
# mode under POSIX
return i ** 2


if __name__ == __main__:
 freeze_support()
 ctx = get_context('spawn')
 ctx.Pool(4).map(compute_stuff, range(8))



If you chmod +x this file and run it with ./mytool, the user will get an 
infinitely running process that keeps displaying on stderr:


Traceback (most recent call last):
  File string, line 1, in module
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 96, in 
spawn_main
exitcode = _main(fd)
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 105, in 
_main
prepare(preparation_data)
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 210, in 
prepare
import_main_path(data['main_path'])
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 256, in 
import_main_path
raise ImportError(name=main_name)
ImportError
Traceback (most recent call last):
  File string, line 1, in module
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 96, in 
spawn_main
exitcode = _main(fd)
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 105, in 
_main
prepare(preparation_data)
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 210, in 
prepare
import_main_path(data['main_path'])
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 256, in 
import_main_path
raise ImportError(name=main_name)
ImportError
...


until the user kills the process. Is there really nothing we can do to avoid 
releasing Python 3.4 with this bug?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-13 Thread Olivier Grisel

Olivier Grisel added the comment:

For Python 3.4:

Maybe rather than raising ImportError, we could issue warning to notify the 
users that names from the __main__ namespace could not be loaded and make the 
init_module_attrs return early.

This way a multiprocessing program that only calls functions defined in 
non-main namespaces could still use the new start method feature introduced 
in Python 3.4 while not changing the Python 3.3 semantics for windows programs 
and not relying on any deprecated hack.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-11 Thread Olivier Grisel

Olivier Grisel added the comment:

I agree that a failure to lookup the module should raise an explicit exception.

 Second, there is no way that 'nosetests' will ever succeed as an import 
 since, as Oliver pointed out, it doesn't end in '.py' or any other 
 identifiable way for a finder to know it can handle the file. So this is not 
 a bug and simply a side-effect of how import works. The only way around it 
 would be to symlink nosetests to nosetests.py or to somehow pre-emptively set 
 up 'nosetests' for supported importing.

I don't agree that (unix) Python programs that don't end with .py should be 
modified to have multiprocessing work correctly. I think it should be the 
multiprocessing responsibility to transparently find out how to spawn the new 
process independently of the fact that the program ends in '.py' or not.

Note: the fork mode works always under unix (with or without the .py 
extension). The spawn mode always work under windows as AFAIK there is no way 
to have Python programs that don't end in .py under windows and furthermore I 
think multiprocessing does execute the __main__ under windows (but I haven't 
tested if it's still the case in Python HEAD).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-11 Thread Olivier Grisel

Olivier Grisel added the comment:

 what is sys.modules['__main__'] and sys.modules['__main__'].__file__ if you 
 run under nose?

$ cat check_stuff.py 
import sys

def test_main():
print(sys.modules['__main__']=%r
  % sys.modules['__main__'])
print(sys.modules['__main__'].__file__=%r
  % sys.modules['__main__'].__file__)


if __name__ == '__main__':
test_main()
(pyhead) ogrisel@is146148:~/tmp$ python check_stuff.py 
sys.modules['__main__']=module '__main__' from 'check_stuff.py'
sys.modules['__main__'].__file__='check_stuff.py'
(pyhead) ogrisel@is146148:~/tmp$ nosetests -s check_stuff.py 
sys.modules['__main__']=module '__main__' from 
'/volatile/ogrisel/envs/pyhead/bin/nosetests'
sys.modules['__main__'].__file__='/volatile/ogrisel/envs/pyhead/bin/nosetests'
.
--
Ran 1 test in 0.001s

OK

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-11 Thread Olivier Grisel

Olivier Grisel added the comment:

Note however that the problem is not specific to nose. If I rename my initial 
'check_forserver.py' script to 'check_forserver', add the '#!/usr/bin/env 
python' header and make it 'chmod +x' I get the same crash.

So the problem is related to the fact that under posix, valid Python programs 
can be executable scripts without the '.py' extension.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-11 Thread Olivier Grisel

Olivier Grisel added the comment:

Here is a patch that uses `imp.load_source` when the first importlib name-based 
lookup fails.

Apparently it fixes the issue on my box but I am not sure whether this is the 
correct way to do it.

--
keywords: +patch
Added file: http://bugs.python.org/file33091/issue19946.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] multiprocessing crash with forkserver or spawn when run from a non .py ending script

2013-12-10 Thread Olivier Grisel

New submission from Olivier Grisel:

Here is a simple python program that uses the new forkserver feature introduced 
in 3.4b1:


name: checkforkserver.py

import multiprocessing
import os


def do(i):
print(i, os.getpid())


def test_forkserver():
mp = multiprocessing.get_context('forkserver')
mp.Pool(2).map(do, range(3))


if __name__ == __main__:
test_forkserver()


When running this using the python check_forkserver.py command everything 
works as expected.

When running this using the nosetests launcher (nosetests -s 
check_forkserver.py), I get:


Traceback (most recent call last):
  File string, line 1, in module
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/forkserver.py, line 
141, in main
spawn.import_main_path(main_path)
  File /opt/Python-HEAD/lib/python3.4/multiprocessing/spawn.py, line 252, in 
import_main_path
methods.init_module_attrs(main_module)
  File frozen importlib._bootstrap, line 1051, in init_module_attrs
AttributeError: 'NoneType' object has no attribute 'loader'


Indeed, the spec variable in multiprocessing/spawn.py's import_main_path
function is None as the nosetests script is not a regular python module: in 
particular is does not have a .py extension.

If I copy or symlink or renamed the nosetests script as nosetests.py in the 
same folder, this works as expected. I am not familiar enough with the 
importlib machinery to suggest a fix for this bug.

Also there is a typo in the comment: causing a psuedo fork bomb = causing a 
pseudo fork bomb.

Note: I am running CPython head updated today.

--
components: Library (Lib)
messages: 205810
nosy: Olivier.Grisel
priority: normal
severity: normal
status: open
title: multiprocessing crash with forkserver  or spawn  when run from a non 
.py ending script
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] multiprocessing crash with forkserver or spawn when run from a non .py ending script

2013-12-10 Thread Olivier Grisel

Changes by Olivier Grisel olivier.gri...@ensta.org:


--
type:  - crash

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19946] multiprocessing crash with forkserver or spawn when run from a non .py ending script

2013-12-10 Thread Olivier Grisel

Olivier Grisel added the comment:

 So the question is exactly what module is being passed to 
 importlib.find_spec() and why isn't it finding a spec/loader for that module.

The module is the `nosetests` python script. module_name == 'nosetests' in this 
case. However, nosetests is not considered an importable module because of the 
missing '.py' extension in the filename.

 Did this code work in Python 3.3?

This code did not exist in Python 3.3.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19946
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19851] reload problem with submodule

2013-12-09 Thread Olivier Grisel

Olivier Grisel added the comment:

I tested the patch on the current HEAD and it fixes a regression introduced 
between 3.3 and 3.4b1 that prevented to build scipy from source with pip 
install scipy.

--
nosy: +Olivier.Grisel

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19851
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-12 Thread Olivier Grisel

Olivier Grisel added the comment:

The process pool executor [1] from the concurrent futures API would be suitable 
to explicitly start and stop the helper process for the `forkserver` mode.

[1] 
http://docs.python.org/3.4/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor

The point would be to have as few state as possible encoded in the 
multiprocessing module (and its singletons) and move that state information to 
be directly managed by multiprocessing Process and Pool class instances so that 
libraries could customize the behavior (start_method, executable, 
ForkingPIckler reducers registry and so on) without mutating the state of the 
multiprocessing module singletons.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18999
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-12 Thread Olivier Grisel

Olivier Grisel added the comment:

Richard Oudkerk: thanks for the clarification, that makes sense. I don't have 
the time either in the coming month, maybe later.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18999
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-11 Thread Olivier Grisel

Olivier Grisel added the comment:

 Maybe it would be better to have separate contexts for each start method.  
 That way joblib could use the forkserver context without interfering with the 
 rest of the user's program.

Yes in general it would be great if libraries could customize the 
multiprocessing default options without impacting any of the module singletons. 
That also include the ForkingPickler registry for custom: now it's a class 
attribute. It would be great to be able to scope custom reducers registration 
to a given multiprocessing.Pool or multiprocessing.Process instance.

Now how to implement that kind of isolation: it could either be done by adding 
new constructor parameters or new public methods to the Process and Pool 
classes to be able to customize their behavior while sticking to the OOP 
paradigm if possible or by using a context manager as you suggested.

I am not sure which option is best. Prototyping both is probably the best way 
to feel the tradeoffs.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18999
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-10 Thread Olivier Grisel

Olivier Grisel added the comment:

Related question: is there any good reason that would prevent to pass a custom 
`start_method` kwarg to the `Pool` constructor to make it use an alternative 
`Popen` instance (that is an instance different from the  
`multiprocessing._Popen` singleton)?

This would allow libraries such as joblib to keep minimal side effect by try to 
impact the default multiprocessing runtime as low as possible.

--
nosy: +Olivier.Grisel

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18999
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel

Olivier Grisel added the comment:

I have implemented a custom subclass of the multiprocessing Pool to be able 
plug custom pickling strategy for this specific use case in joblib:

https://github.com/joblib/joblib/blob/master/joblib/pool.py#L327

In particular it can:

- detect mmap-backed numpy
- transform large memory backed numpy arrays into numpy.memmap instances prior 
to pickling using the /dev/shm partition when available or TMPDIR otherwise.

Here is some doc: 
https://github.com/joblib/joblib/blob/master/doc/parallel_numpy.rst

I could submit the part that makes it possible to customize the picklers of 
multiprocessing.pool.Pool instance to the standard library if people are 
interested.

The numpy specific stuff would stay in third party projects such as joblib but 
at least that would make it easier for people to plug their own optimizations 
without having to override half of the multiprocessing class hierarchy.

--
nosy: +Olivier.Grisel

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17560
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel

Olivier Grisel added the comment:

I forgot to end a sentence in my last comment:

- detect mmap-backed numpy

should read:

- detect mmap-backed numpy arrays and pickle only the filename and other buffer 
metadata to reconstruct a mmap-backed array in the worker processes instead of 
copying the data around.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17560
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel

Olivier Grisel added the comment:

 In 3.3 you can do

 from multiprocessing.forking import ForkingPickler
 ForkingPickler.register(MyType, reduce_MyType)

 Is this sufficient for you needs?  This is private (and its definition has 
 moved in 3.4) but it could be made public.

Indeed I forgot that the multiprocessing pickler was made already made
pluggable in Python 3.3. I needed backward compat for python 2.6 in
joblib hence I had to rewrite a bunch of the class hierarchy.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17560
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: segfault calling SSE enabled library from ctypes

2008-11-30 Thread Olivier Grisel
I found an even simpler solution: use the -mstackrealign GCC option to
build the shared library.

-- 
Olivier
--
http://mail.python.org/mailman/listinfo/python-list


Re: SWIG vs. ctypes (Was: ANN: PyEnchant 1.5.0)

2008-11-25 Thread Olivier Grisel
2008/11/25  [EMAIL PROTECTED]:
 On Nov 25, 4:34 pm, Diez B. Roggisch [EMAIL PROTECTED] wrote:

 You can't use ctypes for C++, only for C-style APIs.

 Diez

 With some work, you can convert your C++ objects to PyObject* and then
 return the latter in a function with C bindings.

http://cython.org also has some support for C++ bindings.

I also heard that Boost.Python is very well suited to make full
featured C++ / python bindings though I never tried myself.

-- 
Olivier
--
http://mail.python.org/mailman/listinfo/python-list


Re: segfault calling SSE enabled library from ctypes

2008-11-25 Thread Olivier Grisel
Replying to myself:

haypo found the origin of the problem. Apparently this problem stems
from a GCC bug [1]  (that should be fixed on x86 as of version 4.4).
The bug is that GCC does not always ensure the stack to be 16 bytes
aligned hence the __m128 myvector local variable in the previous
code might not be aligned. A workaround would be to align the stack
before calling the inner function as done here:

http://www.bitbucket.org/ogrisel/ctypes_sse/changeset/dc27626824b8/

New version of the previous C code:

quote

#include stdio.h
#include emmintrin.h


void wrapped_dummy_sse()
{
// allocate an alligned vector of 128 bits
__m128 myvector;

printf([dummy_sse] before calling setzero\n);
fflush(stdout);

// initialize it to 4 32 bits float valued to zeros
myvector = _mm_setzero_ps();

printf([dummysse] after calling setzero\n);
fflush(stdout);

// display the content of the vector
float* part = (float*) myvector;
printf([dummysse] myvector = {%f, %f, %f, %f}\n,
part[0], part[1], part[2], part[3]);
}

void dummy_sse(void)
{
(void)__builtin_return_address(1); // to force call frame
asm volatile (andl $-16, %%esp ::: %esp);
wrapped_dummy_sse();
}

int main()
{
dummy_sse();
return 0;
}

/quote

[1] see e.g. for a nice summary of the issue
http://www.mail-archive.com/gcc%40gcc.gnu.org/msg33101.html

Another workaround would be to allocate myvector in the heap using
malloc / posix_memalign for instance.

Best,

-- 
Olivie
--
http://mail.python.org/mailman/listinfo/python-list


segfault calling SSE enabled library from ctypes

2008-11-24 Thread Olivier Grisel
Hello,

It seems that I am able to reproduce the same problem as reported
earlier on this list by someone else:

  http://mail.python.org/pipermail/python-list/2008-October/511794.html

Similar setup: python 2.5.2 / gcc (Ubuntu 4.3.2-1ubuntu11) from
Intrepid on 32bit intel Core 2 Duo.  I can confirm this is not related
to any alignment problem from data passed from python, I took care of
that and I could even reproduce the problem with the following minimal
test case that does not use any external data (you can fetch the
following source here
http://www.bitbucket.org/ogrisel/ctypes_sse/get/tip.gz )

sample

=== dummysse.c ===

#include stdio.h
#include emmintrin.h

void dummy_sse(void)
{
// allocate an alligned vector of 128 bits
__m128 myvector;

printf([dummy_sse] before calling setzero\n);
fflush(stdout);

// initialize it to 4 32 bits float valued to zeros
myvector = _mm_setzero_ps();

printf([dummysse] after calling setzero\n);
fflush(stdout);

// display the content of the vector
float* part = (float*) myvector;
printf([dummysse] myvector = {%f, %f, %f, %f}\n,
part[0], part[1], part[2], part[3]);
}

int main()
{
dummy_sse();
return 0;
}

=== dummysse.py ===

from ctypes import cdll

lib = cdll.LoadLibrary('./dummysse.so')
lib.dummy_sse()


=== Makefile ===
CC = gcc
CFLAGS = -Wall -g -O0 -msse2

all: dummysse dummysse.so

dummysse:
$(CC) $(CFLAGS) $(LIBS) -o dummysse dummysse.c
#   ./dummysse

dummysse.so:
$(CC) $(CFLAGS) $(LIBS) -shared -o dummysse.so dummysse.c
#   python dummysse.py

clean:
rm -f dummysse dummysse.so

/sample

By running the main of the C program I get the expected behavior:

  gcc -Wall -g -O0 -msse2  -o dummysse dummysse.c
  ./dummysse
  [dummy_sse] before calling setzero
  [dummysse] after calling setzero
  [dummysse] myvector = {0.00, 0.00, 0.00, 0.00}

Running from python, the call to the _mm_setzero_ps() segfaults:

  gcc -Wall -g -O0 -msse2  -shared -o dummysse.so dummysse.c
  python dummysse.py
  [dummy_sse] before calling setzero
  Segmentation fault

Is this to be expected? The result to a call to valgrind python
dummysse.py is available here :

  http://www.bitbucket.org/ogrisel/ctypes_sse/src/tip/valgrind.log

I am not familiar with python internal at all so I cannot understand
what's wrong. You can notice that valgrind make the program run till
the end and display the correct results (4 zeros) on stdout while
logging a bunch of errors (most of those are not related to our
problem since they appear when launching python on an empty script).

-- 
Olivier
--
http://mail.python.org/mailman/listinfo/python-list


Re: How to calculate the CPU time consumption and memory consuption of any python program in Linux

2005-12-24 Thread Olivier Grisel
gene tani a écrit :
 Shahriar Shamil Uulu wrote:
 Thank you, for your directions and advices.
 shahriar ...
 
 also look:
 
 http://spyced.blogspot.com/2005/09/how-well-do-you-know-python-part-9.html
 
 whihc mentions twisted.python.reflect.findInstances(sys.modules, str)
 and objgrep, which i didn't know about
 

This looks relevant too (not tested though):

http://pysizer.8325.org/

-- 
Olivier

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Vaults of Parnassus hasn't been updated for months

2005-12-23 Thread Olivier Grisel
Wolfgang Grafen a écrit :
 What happened to the Vaults of Parnassus? It was always my
 favourite resource for Python code since ever. The latest
 entry is now 8/23. It has been up to date for years but now...
 What a pity!

Everybody is using the cheeseshop now:

http://cheeseshop.python.org/pypi?%3Aaction=browse

-- 
Olivier

-- 
http://mail.python.org/mailman/listinfo/python-list