[issue14422] Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings

2012-03-27 Thread STINNER Victor

New submission from STINNER Victor :

It is possible to reduce PyASCIIObject.state to 8 bits instead of 32, move it 
to the end (exchange wstr and state) of the structure and pack the structure. 
As a result, the structure size is reduced by 3 bytes (state type changes from 
int to char).

I expect a low or not overhead on performances because only PyASCIIObject.state 
field is affected and this field size is 8 bits.

See also the issue #14419 which relies on memory alignment (of the ASCII string 
data) to optimize the ASCII decoder. If I understand correctly, my patch 
disables the possibility of this optimization.

--

Example on Linux 32 bits:

$ cat x.c 
#include 

int main()
{
printf("sizeof(PyASCIIObject)=%u bytes\n", sizeof(PyASCIIObject));
printf("sizeof(PyCompactUnicodeObject)=%u bytes\n", 
sizeof(PyCompactUnicodeObject));
printf("sizeof(PyUnicodeObject)=%u bytes\n", sizeof(PyUnicodeObject));
return 0;
}

# unpatched
$ gcc -I Include/ -I . x.c -o x && ./x
sizeof(PyASCIIObject)=24 bytes
sizeof(PyCompactUnicodeObject)=36 bytes
sizeof(PyUnicodeObject)=40 bytes

# pack the 3 structures
$ gcc -I Include/ -I . x.c -o x && ./x
sizeof(PyASCIIObject)=21 bytes
sizeof(PyCompactUnicodeObject)=33 bytes
sizeof(PyUnicodeObject)=37 bytes

--

We might also pack PyCompactUnicodeObject and PyUnicodeObject but it would have 
a bad impact on performances because utf8_length, utf8, wstr_length and data 
would not be aligned anymore.

--
components: Interpreter Core
files: pack_pyasciiobject.patch
keywords: patch
messages: 156905
nosy: haypo, loewis, pitrou, storchaka
priority: normal
severity: normal
status: open
title: Pack PyASCIIObject fields to reduce memory consumption of pure ASCII 
strings
versions: Python 3.3
Added file: http://bugs.python.org/file25037/pack_pyasciiobject.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14422] Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings

2012-03-27 Thread STINNER Victor

STINNER Victor  added the comment:

iobench and stringbench results on unpatched Python:

$ ./python Tools/iobench/iobench.py -t
Preparing files...
Python 3.3.0a1+ (default:51016ff7f8c9, Mar 27 2012, 13:19:52) 
[GCC 4.6.1]
Unicode: PEP 393
Linux-3.0.0-16-generic-pae-i686-with-debian-wheezy-sid
Text unit = one character (utf8-decoded)

** Text input **

[ 400KB ] read one unit at a time...5.4 MB/s
[ 400KB ] read 20 units at a time... 68 MB/s
[ 400KB ] read one line at a time...174 MB/s
[ 400KB ] read 4096 units at a time...  289 MB/s

[  20KB ] read whole contents at once...315 MB/s
[ 400KB ] read whole contents at once...332 MB/s
[  10MB ] read whole contents at once...292 MB/s

[ 400KB ] seek forward one unit at a time...  0.304 MB/s
[ 400KB ] seek forward 1000 units at a time...  312 MB/s

** Text append **

[  20KB ] write one unit at a time...  3.05 MB/s
[ 400KB ] write 20 units at a time...43 MB/s
[ 400KB ] write 4096 units at a time... 554 MB/s
[  10MB ] write 1e6 units at a time...  450 MB/s

** Text overwrite **

[  20KB ] modify one unit at a time... 1.18 MB/s
[ 400KB ] modify 20 units at a time... 18.9 MB/s
[ 400KB ] modify 4096 units at a time...400 MB/s

$ ./python stringbench/stringbench.py 
stringbench v2.0
3.3.0a1+ (default:51016ff7f8c9, Mar 27 2012, 13:19:52) 
[GCC 4.6.1]
2012-03-27 13:21:01.217823
bytes   unicode
(in ms) (in ms) %   comment
== case conversion -- dense
0.370.3897.9("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower() 
(*1000)
0.380.3899.3("where in the world is carmen san deigo?"*10).upper() 
(*1000)
== case conversion -- rare
0.380.3899.9("Where in the world is Carmen San Deigo?"*10).lower() 
(*1000)
0.430.38113.6   ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper() 
(*1000)
== concat 20 strings of words length 4 to 15
1.761.69104.2   s1+s2+s3+s4+...+s20 (*1000)
== concat two strings
0.080.07107.7   "Andrew"+"Dalke" (*1000)
== count AACT substrings in DNA example
2.152.13100.7   dna.count("AACT") (*10)
== count newlines
0.650.58110.8   ...text.with.2000.newlines.count("\n") (*10)
== early match, single character
0.200.19107.9   ("A"*1000).find("A") (*1000)
0.360.05745.8   "A" in "A"*1000 (*1000)
0.180.1996.4("A"*1000).index("A") (*1000)
0.180.2185.5("A"*1000).partition("A") (*1000)
0.210.20103.6   ("A"*1000).rfind("A") (*1000)
0.210.3069.8("A"*1000).rindex("A") (*1000)
0.370.21171.7   ("A"*1000).rpartition("A") (*1000)
0.380.3998.4("A"*1000).rsplit("A", 1) (*1000)
0.370.37100.7   ("A"*1000).split("A", 1) (*1000)
== early match, two characters
0.200.19107.7   ("AB"*1000).find("AB") (*1000)
0.360.05702.1   "AB" in "AB"*1000 (*1000)
0.180.1996.9("AB"*1000).index("AB") (*1000)
0.200.2483.9("AB"*1000).partition("AB") (*1000)
0.200.20103.6   ("AB"*1000).rfind("AB") (*1000)
0.200.19102.9   ("AB"*1000).rindex("AB") (*1000)
0.200.2386.7("AB"*1000).rpartition("AB") (*1000)
0.390.4097.7("AB"*1000).rsplit("AB", 1) (*1000)
0.400.4294.4("AB"*1000).split("AB", 1) (*1000)
== endswith multiple characters
0.170.1992.6"Andrew".endswith("Andrew") (*1000)
== endswith multiple characters - not!
0.170.1895.2"Andrew".endswith("Anders") (*1000)
== endswith single character
0.170.1892.3"Andrew".endswith("w") (*1000)
== formatting a string type with a dict
N/A 0.910.0 "The %(k1)s is %(k2)s the 
%(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
== join empty string, with 1 character sep
N/A 0.040.0 "A".join("") (*100)
== join empty string, with 5 character sep
N/A 0.040.0 "ABCDE".join("") (*100)
== join list of 100 words, with 1 character sep
1.371.7180.0"A".join(["Bob"]*100)) (*1000)
== join list of 100 words, with 5 character sep
1.501.8680.8"ABCDE".join(["Bob"]*100)) (*1000)
== join list of 26 characters, with 1 character sep
0.480.4999.6"A".join(list("ABC..Z")) (*1000)
== join list of 26 characters, with 5 character sep
0.490.5491.3"ABCDE".join(list("ABC..Z")) (*1000)
== join string with 26 characters, with 1 character sep
N/A 1.170.0 "A".join("ABC..Z") (*1000)
== join string with 26 characters, with 5 character sep
N/A 1.220.0 "ABCDE".join("ABC..Z") (*1000)
== late match, 100 characters
8.488.46100.2   s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
4.193.50119.

[issue14422] Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings

2012-03-27 Thread STINNER Victor

STINNER Victor  added the comment:

Compare stringio total: 160.84 (unpatched) vs 160.53 (patched). I don't see any 
difference in the benchmarks results. The small differnces are just the noise 
of the benchmark.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14422] Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings

2012-03-27 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

-1. Using packed structures may violate all kinds of expectations in extension 
modules. I consider it important that the data block of a string is 
well-aligned.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14422] Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings

2012-03-30 Thread Jesús Cea Avión

Changes by Jesús Cea Avión :


--
nosy: +jcea

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14422] Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings

2012-03-30 Thread R. David Murray

R. David Murray  added the comment:

Looks like this should be closed rejected?

--
nosy: +r.david.murray
type:  -> enhancement

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14422] Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings

2012-03-30 Thread STINNER Victor

STINNER Victor  added the comment:

> I consider it important that the data block of a string is well-aligned.

I suppose that it doesn't matter for latin1, but it can be a problem for UCS-2 
and UCS-4. There are more drawbacks than advantages, so I agree to close this 
issue. And let's focus on enabling optimizations based on memory alignement 
like #14419 :-)

--
resolution:  -> wont fix
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com