On Mon, Aug 14, 2017 at 5:55 PM, Guillermo Polito <guillermopol...@gmail.com
> wrote:

> In a full image (just bootstrapped) we have:
>
>  7.7 MB of arrays (probably in collections, we should check usages)
>  6.3 MB of methods
>  5.3 MB of ByteArrays
>  3.3 MB of ByteStrings
>

What size do you get when all those ByteStrings are written to a text file
and zipped up?
(I'd try myself but I don't have access to Pharo from where I am right now)

Perhaps on image save, all ByteStrings can be converted to
ZippedByteStrings and lazily converted back as needed.
Depending on the measured performance hit, "as needed" could be first
access, or only when the string needs to be updated, or not at all.

Now I was trying for a rough estimate of saved space like this...

stringsToZip := ByteString allInstances.  "do this line once only for
repeatability"
zipStrings := OrderedCollection new.
entryPrefix := 'a'.   "or... bbbbbbbbbbbbb"
i := 0.
zip := ZipArchive new.
stringsToZip do: [ :bs |
zipStrings add: (zip addDeflateString: bs as: entryPrefix, (i:=i+1)
printString) ].
zip writeToFileNamed: 'ByteStrings.zip'.

uncompressedSize := compressedSize := 0.
zipStrings do: [ :zs |
uncompressedSize := uncompressedSize + zs uncompressedSize.
compressedSize   := compressedSize   + zs compressedSize.  ].
(uncompressedSize//1024) -> (compressedSize//1024).

" 2975->1365 <== entryPrefix='bbbbbbbbbbbbb'  "
" 2975->1365 <== entryPrefix='a'    "

We'd need to observe which/when strings are converted back to determine if
its a real in-operation space saving, and what impact it has on
performance. Has anyone done similar before?

-----------------

Now before posting the above, I went back to chart the effect of
compression...
strSizes := Dictionary new.
zipSizes := Dictionary new.
zipStrings do: [ :zs |
strSizes at: zs uncompressedSize accumulate: zs uncompressedSize.
zipSizes at: zs uncompressedSize accumulate: zs compressedSize ].
strSizes keys sorted do: [ :strSize |
Transcript cr;
show: ( strSize printStringLength: 10);
show: ((strSizes at: strSize) printStringLength: 10);
show: ((zipSizes at: strSize) printStringLength: 10)]

and loaded that output into Excel to produce the attached graph, which
shows its detrimental for strings below 100 bytes, and limited benefit
above that until the very last data point.  For comparison, here are the
last two data points (largest strings).
       74498       74498      13781
   1621978   1621978    234281

and its content on that largest string starts like this...

    0000;<control>;Cc;0;BN;;;;;N;NULL;;;;
    0001;<control>;Cc;0;BN;;;;;N;START OF HEADING;;;;
    0002;<control>;Cc;0;BN;;;;;N;START OF TEXT;;;;
    0003;<control>;Cc;0;BN;;;;;N;END OF TEXT;;;;
    0004;<control>;Cc;0;BN;;;;;N;END OF TRANSMISSION;;;;
    0005;<control>;Cc;0;BN;;;;;N;ENQUIRY;;;;
    0006;<control>;Cc;0;BN;;;;;N;ACKNOWLEDGE;;;;
    0007;<control>;Cc;0;BN;;;;;N;BELL;;;;
    0008;<control>;Cc;0;BN;;;;;N;BACKSPACE;;;;
    0009;<control>;Cc;0;S;;;;;N;CHARACTER TABULATION;;;;
    000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;;
    000B;<control>;Cc;0;S;;;;;N;LINE TABULATION;;;;
    000C;<control>;Cc;0;WS;;;;;N;FORM FEED (FF);;;;
    000D;<control>;Cc;0;B;;;;;N;CARRIAGE RETURN (CR);;;;

Now #pointersTo shows that (apart from the dozens of GT objects grabbing at
it)
it is held by the "receiver" ivar of OpalCompiler.  What does it do there?

cheers -ben


cheers -ben


>  2.7 MB of Bitmaps
>  1.8 MB of ByteSymbols
>
> That sumps up aready ~27 MB
>
>
>
>

Reply via email to