[issue42160] unnecessary overhead in tempfile
Eric Wolf added the comment: >>> timeit(os.getpid) 0.0899073329931 Considering the reference leaks, os.getpid() seems to be the better solution. -- ___ Python tracker <https://bugs.python.org/issue42160> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42160] unnecessary overhead in tempfile
Eric Wolf added the comment: It would be possible to allow the GC to finalize the Random instances through weak references. On the other hand, if many _RandomNameSequence instances were used temporarily, a lot of callbacks would be registered via os.register_at_fork(), could that cause problems? -- ___ Python tracker <https://bugs.python.org/issue42160> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42160] unnecessary overhead in tempfile
Eric Wolf added the comment: Thanks -- ___ Python tracker <https://bugs.python.org/issue42160> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42160] unnecessary overhead in tempfile
Eric Wolf added the comment: It seems to be insignificant, however it would allow for easier monkey-patching: https://bugs.python.org/issue32276 Instead of changing _Random one could simply assign a new instance to _named_sequence -- ___ Python tracker <https://bugs.python.org/issue42160> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42160] unnecessary overhead in tempfile
Eric Wolf added the comment: SystemRandom seems to be slower: from random import Random, SystemRandom from timeit import timeit user = Random() system = SystemRandom() characters = "abcdefghijklmnopqrstuvwxyz0123456789_" timeit(lambda: user.choice(characters)) >>> 0.5491522020020057 timeit(lambda: system.choice(characters)) >>> 2.9195130389998667 -- ___ Python tracker <https://bugs.python.org/issue42160> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42160] unnecessary overhead in tempfile
Change by Eric Wolf : -- keywords: +patch pull_requests: +21911 stage: -> patch review pull_request: https://github.com/python/cpython/pull/22997 ___ Python tracker <https://bugs.python.org/issue42160> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42160] unnecessary overhead in tempfile
New submission from Eric Wolf : The tempfile module contains the class _RandomNameSequence, which has the rng property. This property checks os.getpid() every time and re-initializes a random number generator when it has changed. However, this is only necessary on systems which allow the process to be cloned and should be solved on such systems with os.register_at_fork (see the random module). -- ___ Python tracker <https://bugs.python.org/issue42160> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42160] unnecessary overhead in tempfile
Change by Eric Wolf : -- components: Library (Lib) nosy: Deric-W priority: normal severity: normal status: open title: unnecessary overhead in tempfile type: enhancement versions: Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue42160> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10900] bz2 module fails to uncompress large files
Eric Wolf ebw...@gmail.com added the comment: I tried the change you suggested. It still fails but now at 572,320 bytes instead of 900,000. I'm not sure why the difference in bytes read. I'll explore this more in a bit. I also converted the BZ2 to GZ and used the gzip module. It's failing after reading 46628864 bytes. The GZ file is 33GB compared to the 22GB BZ2. I've attached the strace output. I was getting an error with the sbrk parameter, so I left it out. Let me know if there's anything else I can provide. -- versions: +Python 2.5 Added file: http://bugs.python.org/file20961/strace_bz2.txt ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10900 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10900] bz2 module fails to uncompress large files
Eric Wolf ebw...@gmail.com added the comment: Stupid questions are always worth asking. I did check the MD5 sum earlier and just checked it again (since I copied the file from one machine to another): ebwolf@ubuntu:/opt$ md5sum /host/full-planet-110115-1800.osm.bz2 0e3f81ef0dd415d8f90f1378666a400c /host/full-planet-110115-1800.osm.bz2 ebwolf@ubuntu:/opt$ cat full-planet-110115-1800.osm.bz2.md5 0e3f81ef0dd415d8f90f1378666a400c full-planet-110115-1800.osm.bz2 There you have it. I was able to convert the bz2 to gzip with no errors: bzcat full-planet-110115-1800.osm.bz2 | gzip full-planet.osm.gz FYI: This problem came up last year with no resolution: http://mail.python.org/pipermail/tutor/2010-February/074610.html Thanks for looking at this. Let me know if there's anything else you'd like me to try. In general, is it best to always read the same number of bytes? And what is the best value to pass for buffering in BZ2File? I just made up something hoping it would work. I'm still waiting on the bzcat to /dev/null -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10900 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10900] bz2 module fails to uncompress large files
Eric Wolf ebw...@gmail.com added the comment: The only problem with the theory that the file is corrupt is that at least three people have encountered exactly the same problem with three files: http://mail.python.org/pipermail/tutor/2010-June/076343.html Colin was using an OSM planet file from some time last year and it quit at exactly 90 bytes. I'm trying bzip2 -t on the file to see if it reports any problems. These things take time... the bzcat to /dev/null still hasn't completed. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10900 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10900] bz2 module fails to uncompress large files
Eric Wolf ebw...@gmail.com added the comment: I just got confirmation that OSM is using pbzip2 to generate these files. So they are multi-stream. At least that gives a final answer but doesn't solve my problem. I saw this: http://bugs.python.org/issue1625 Does anyone know the current status of the patch supporting multistream bz2? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10900 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10900] bz2 module fails to uncompress large files
Eric Wolf ebw...@gmail.com added the comment: I'm experiencing the same thing. My script works perfectly on a 165MB file but fails after reading 900,000 bytes on a 22GB file. My script uses a buffered bz2file.read and is agnostic about end-of-lines. Opening with rb does not help. It is specifically written to avoid reading too much into memory at once. I have tested this script on: Python 2.5.1 (r251:54863) (ESRI ArcGIS version) (WinXP 64-bit) Python 2.7.1.4 (r271:86832) (64-bit ActiveState version) (WinXP 64-bit) Python 2.6.4 (r264:75706) (Ubuntu 9.10 64-bit) Check here for some really big BZ2 files: http://planet.openstreetmap.org/full-experimental/ -- nosy: +Eric.Wolf Added file: http://bugs.python.org/file20952/OSM_Extract.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10900 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com