[issue42160] unnecessary overhead in tempfile

2020-10-31 Thread Eric Wolf


Eric Wolf  added the comment:

>>> timeit(os.getpid)
0.0899073329931

Considering the reference leaks, os.getpid() seems to be the better solution.

--

___
Python tracker 
<https://bugs.python.org/issue42160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42160] unnecessary overhead in tempfile

2020-10-31 Thread Eric Wolf


Eric Wolf  added the comment:

It would be possible to allow the GC to finalize the Random instances through 
weak references.
On the other hand, if many _RandomNameSequence instances were used temporarily, 
a lot of callbacks would be registered via os.register_at_fork(), could that 
cause problems?

--

___
Python tracker 
<https://bugs.python.org/issue42160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42160] unnecessary overhead in tempfile

2020-10-30 Thread Eric Wolf


Eric Wolf  added the comment:

Thanks

--

___
Python tracker 
<https://bugs.python.org/issue42160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42160] unnecessary overhead in tempfile

2020-10-26 Thread Eric Wolf


Eric Wolf  added the comment:

It seems to be insignificant, however it would allow for easier 
monkey-patching: https://bugs.python.org/issue32276

Instead of changing _Random one could simply assign a new instance to 
_named_sequence

--

___
Python tracker 
<https://bugs.python.org/issue42160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42160] unnecessary overhead in tempfile

2020-10-26 Thread Eric Wolf


Eric Wolf  added the comment:

SystemRandom seems to be slower:


from random import Random, SystemRandom
from timeit import timeit

user = Random()
system = SystemRandom()
characters = "abcdefghijklmnopqrstuvwxyz0123456789_"

timeit(lambda: user.choice(characters))
>>> 0.5491522020020057

timeit(lambda: system.choice(characters))
>>> 2.9195130389998667

--

___
Python tracker 
<https://bugs.python.org/issue42160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42160] unnecessary overhead in tempfile

2020-10-26 Thread Eric Wolf


Change by Eric Wolf :


--
keywords: +patch
pull_requests: +21911
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/22997

___
Python tracker 
<https://bugs.python.org/issue42160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42160] unnecessary overhead in tempfile

2020-10-26 Thread Eric Wolf


New submission from Eric Wolf :

The tempfile module contains the class _RandomNameSequence, which has the rng 
property.
This property checks os.getpid() every time and re-initializes a random number 
generator when it has changed.
However, this is only necessary on systems which allow the process to be cloned 
and should be solved on such systems with os.register_at_fork (see the random 
module).

--

___
Python tracker 
<https://bugs.python.org/issue42160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42160] unnecessary overhead in tempfile

2020-10-26 Thread Eric Wolf


Change by Eric Wolf :


--
components: Library (Lib)
nosy: Deric-W
priority: normal
severity: normal
status: open
title: unnecessary overhead in tempfile
type: enhancement
versions: Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue42160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Eric Wolf

Eric Wolf ebw...@gmail.com added the comment:

I tried the change you suggested. It still fails but now at 572,320 bytes 
instead of 900,000. I'm not sure why the difference in bytes read. I'll explore 
this more in a bit.

I also converted the BZ2 to GZ and used the gzip module. It's failing after 
reading 46628864 bytes. The GZ file is 33GB compared to the 22GB BZ2.

I've attached the strace output. I was getting an error with the sbrk 
parameter, so I left it out. Let me know if there's anything else I can provide.

--
versions: +Python 2.5
Added file: http://bugs.python.org/file20961/strace_bz2.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10900
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Eric Wolf

Eric Wolf ebw...@gmail.com added the comment:

Stupid questions are always worth asking. I did check the MD5 sum earlier and 
just checked it again (since I copied the file from one machine to another):

ebwolf@ubuntu:/opt$ md5sum /host/full-planet-110115-1800.osm.bz2 
0e3f81ef0dd415d8f90f1378666a400c  /host/full-planet-110115-1800.osm.bz2
ebwolf@ubuntu:/opt$ cat full-planet-110115-1800.osm.bz2.md5 
0e3f81ef0dd415d8f90f1378666a400c  full-planet-110115-1800.osm.bz2

There you have it. I was able to convert the bz2 to gzip with no errors:

bzcat full-planet-110115-1800.osm.bz2 | gzip  full-planet.osm.gz

FYI: This problem came up last year with no resolution:

http://mail.python.org/pipermail/tutor/2010-February/074610.html

Thanks for looking at this. Let me know if there's anything else you'd like me 
to try. In general, is it best to always read the same number of bytes? And 
what is the best value to pass for buffering in BZ2File? I just made up 
something hoping it would work.

I'm still waiting on the bzcat to /dev/null

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10900
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Eric Wolf

Eric Wolf ebw...@gmail.com added the comment:

The only problem with the theory that the file is corrupt is that at least 
three people have encountered exactly the same problem with three files:

http://mail.python.org/pipermail/tutor/2010-June/076343.html

Colin was using an OSM planet file from some time last year and it quit at 
exactly 90 bytes.

I'm trying bzip2 -t on the file to see if it reports any problems. These things 
take time... the bzcat to /dev/null still hasn't completed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10900
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10900] bz2 module fails to uncompress large files

2011-03-01 Thread Eric Wolf

Eric Wolf ebw...@gmail.com added the comment:

I just got confirmation that OSM is using pbzip2 to generate these files. So 
they are multi-stream. At least that gives a final answer but doesn't solve my 
problem.

I saw this: http://bugs.python.org/issue1625

Does anyone know the current status of the patch supporting multistream bz2?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10900
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10900] bz2 module fails to uncompress large files

2011-02-28 Thread Eric Wolf

Eric Wolf ebw...@gmail.com added the comment:

I'm experiencing the same thing. My script works perfectly on a 165MB file but 
fails after reading 900,000 bytes on a 22GB file.

My script uses a buffered bz2file.read and is agnostic about end-of-lines. 
Opening with rb does not help. It is specifically written to avoid reading 
too much into memory at once.

I have tested this script on:
Python 2.5.1 (r251:54863) (ESRI ArcGIS version) (WinXP 64-bit)
Python 2.7.1.4 (r271:86832) (64-bit ActiveState version) (WinXP 64-bit)
Python 2.6.4 (r264:75706) (Ubuntu 9.10 64-bit)

Check here for some really big BZ2 files:

http://planet.openstreetmap.org/full-experimental/

--
nosy: +Eric.Wolf
Added file: http://bugs.python.org/file20952/OSM_Extract.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10900
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com