New submission from Paweł Miech <pawel...@gmail.com>:

I'm porting some code from Python 2.7 to Python 3.8. There is some code that is 
using shelve.DbfilenameShelf to store some nested dictionaries with sets. I 
found out that compared with Python 2.7 Python 3.8 shelve generates files that 
are approximately 164 larger on disk. Python 3.8 file is 2 027 520 size, when 
Python 2.7 size is 12 288.

Code sample:
Filename: test_anydbm.py

#!/usr/bin/env python
import datetime
import shelve
import sys
import time
from os import path


def main():
    print(sys.version)
    fname = 'shelf_test_{}'.format(datetime.datetime.now().isoformat())
    bucket = shelve.DbfilenameShelf(fname, "n")
    now = time.time()
    limit = 1000
    key = 'some key > some key > other'
    top_dict = {}
    to_store = {
        1: {
            'page_item_numbers': set(),
            'products_on_page': None
        }
    }
    for i in range(limit):
        to_store[1]['page_item_numbers'].add(i)
        top_dict[key] = to_store
        bucket[key] = top_dict
    end = time.time()
    db_file = False
    try:
        fsize = path.getsize(fname)
    except Exception as e:
        print("file not found? {}".format(e))
        try:
            fsize = path.getsize(fname + '.db')
            db_file = True
        except Exception as e:
            print("file not found? {}".format(e))
            fsize = None
    print("Stored {} in {} filesize {}".format(limit, end - now, fsize))
    print(fname)
    bucket.close()
    bucket = shelve.DbfilenameShelf(fname, flag="r")
    if db_file:
        fname += '.db'
    print("In file {} {}".format(fname, len(list(bucket.items()))))

Output of running it in docker image:

Dockerfile:
FROM python:2-jessie
VOLUME /scripts
CMD scripts/test_anydbm.py

2.7.16 (default, Jul 10 2019, 03:39:20) 
[GCC 4.9.2]
Stored 1000 in 0.0814290046692 filesize 12288
shelf_test_2020-07-08T07:26:23.778769
In file shelf_test_2020-07-08T07:26:23.778769 1


So you can see file size: 12 288

And now running same thing in Python 3

Dockerfile:

FROM python:3.8-slim-buster
VOLUME /scripts
CMD scripts/test_anydbm.py

3.8.3 (default, Jun  9 2020, 17:49:41) 
[GCC 8.3.0]
Stored 1000 in 0.02681446075439453 filesize 2027520
shelf_test_2020-07-08T07:27:18.068638
In file shelf_test_2020-07-08T07:27:18.068638 1

Notice file size: 2 027 520

Why is this happening? Is this a bug? If I'd like to fix it, do you have some 
ideas about causes of this?

----------
components: Library (Lib)
files: test_anydbm.py
messages: 373284
nosy: Paweł Miech
priority: normal
severity: normal
status: open
title: Python 3 shelve.DbfilenameShelf is generating 164 times larger files 
than Python 2.7 when storing dicts
type: resource usage
versions: Python 3.8
Added file: https://bugs.python.org/file49304/test_anydbm.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41238>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to