New submission from Paweł Miech <pawel...@gmail.com>: I'm porting some code from Python 2.7 to Python 3.8. There is some code that is using shelve.DbfilenameShelf to store some nested dictionaries with sets. I found out that compared with Python 2.7 Python 3.8 shelve generates files that are approximately 164 larger on disk. Python 3.8 file is 2 027 520 size, when Python 2.7 size is 12 288.
Code sample: Filename: test_anydbm.py #!/usr/bin/env python import datetime import shelve import sys import time from os import path def main(): print(sys.version) fname = 'shelf_test_{}'.format(datetime.datetime.now().isoformat()) bucket = shelve.DbfilenameShelf(fname, "n") now = time.time() limit = 1000 key = 'some key > some key > other' top_dict = {} to_store = { 1: { 'page_item_numbers': set(), 'products_on_page': None } } for i in range(limit): to_store[1]['page_item_numbers'].add(i) top_dict[key] = to_store bucket[key] = top_dict end = time.time() db_file = False try: fsize = path.getsize(fname) except Exception as e: print("file not found? {}".format(e)) try: fsize = path.getsize(fname + '.db') db_file = True except Exception as e: print("file not found? {}".format(e)) fsize = None print("Stored {} in {} filesize {}".format(limit, end - now, fsize)) print(fname) bucket.close() bucket = shelve.DbfilenameShelf(fname, flag="r") if db_file: fname += '.db' print("In file {} {}".format(fname, len(list(bucket.items())))) Output of running it in docker image: Dockerfile: FROM python:2-jessie VOLUME /scripts CMD scripts/test_anydbm.py 2.7.16 (default, Jul 10 2019, 03:39:20) [GCC 4.9.2] Stored 1000 in 0.0814290046692 filesize 12288 shelf_test_2020-07-08T07:26:23.778769 In file shelf_test_2020-07-08T07:26:23.778769 1 So you can see file size: 12 288 And now running same thing in Python 3 Dockerfile: FROM python:3.8-slim-buster VOLUME /scripts CMD scripts/test_anydbm.py 3.8.3 (default, Jun 9 2020, 17:49:41) [GCC 8.3.0] Stored 1000 in 0.02681446075439453 filesize 2027520 shelf_test_2020-07-08T07:27:18.068638 In file shelf_test_2020-07-08T07:27:18.068638 1 Notice file size: 2 027 520 Why is this happening? Is this a bug? If I'd like to fix it, do you have some ideas about causes of this? ---------- components: Library (Lib) files: test_anydbm.py messages: 373284 nosy: Paweł Miech priority: normal severity: normal status: open title: Python 3 shelve.DbfilenameShelf is generating 164 times larger files than Python 2.7 when storing dicts type: resource usage versions: Python 3.8 Added file: https://bugs.python.org/file49304/test_anydbm.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41238> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com