Op 27-07-13 20:21, wxjmfa...@gmail.com schreef:

Quickly. sys.getsizeof() at the light of what I explained.

1) As this FSR works with multiple encoding, it has to keep
track of the encoding. it puts is in the overhead of str
class (overhead = real overhead + encoding). In such
a absurd way, that a

sys.getsizeof('€')
40

needs 14 bytes more than a

sys.getsizeof('z')
26

You may vary the length of the str. The problem is
still here. Not bad for a coding scheme.

2) Take a look at this. Get rid of the overhead.

sys.getsizeof('b'*1000000 + 'c')
1000026
sys.getsizeof('b'*1000000 + '€')
2000040

What does it mean? It means that Python has to
reencode a str every time it is necessary because
it works with multiple codings.

So? The same effect can be seen with other datatypes.

>>> nr = 32767
>>> sys.getsizeof(nr)
14
>>> nr += 1
>>> sys.getsizeof(nr)
16



This FSR is not even a copy of the utf-8.
len(('b'*1000000 + '€').encode('utf-8'))
1000003

Why should it be? Why should a unicode string be a copy
of its utf-8 encoding? That makes as much sense as expecting
that a number would be a copy of its string reprensentation.


utf-8 or any (utf) never need and never spend their time
in reencoding.

So? That python sometimes needs to do some kind of background
processing is not a problem, whether it is garbage collection,
allocating more memory, shufling around data blocks or reencoding a
string, that doesn't matter. If you've got a real world example where
one of those things noticeably slows your program down or makes the
program behave faulty then you have something that is worthy of
attention.

Until then you are merely harboring a pet peeve.

--
Antoon Pardon
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to