On Mon, Aug 14, 2017 at 5:55 PM, Guillermo Polito <guillermopol...@gmail.com > wrote:
> In a full image (just bootstrapped) we have: > > 7.7 MB of arrays (probably in collections, we should check usages) > 6.3 MB of methods > 5.3 MB of ByteArrays > 3.3 MB of ByteStrings > What size do you get when all those ByteStrings are written to a text file and zipped up? (I'd try myself but I don't have access to Pharo from where I am right now) Perhaps on image save, all ByteStrings can be converted to ZippedByteStrings and lazily converted back as needed. Depending on the measured performance hit, "as needed" could be first access, or only when the string needs to be updated, or not at all. Now I was trying for a rough estimate of saved space like this... stringsToZip := ByteString allInstances. "do this line once only for repeatability" zipStrings := OrderedCollection new. entryPrefix := 'a'. "or... bbbbbbbbbbbbb" i := 0. zip := ZipArchive new. stringsToZip do: [ :bs | zipStrings add: (zip addDeflateString: bs as: entryPrefix, (i:=i+1) printString) ]. zip writeToFileNamed: 'ByteStrings.zip'. uncompressedSize := compressedSize := 0. zipStrings do: [ :zs | uncompressedSize := uncompressedSize + zs uncompressedSize. compressedSize := compressedSize + zs compressedSize. ]. (uncompressedSize//1024) -> (compressedSize//1024). " 2975->1365 <== entryPrefix='bbbbbbbbbbbbb' " " 2975->1365 <== entryPrefix='a' " We'd need to observe which/when strings are converted back to determine if its a real in-operation space saving, and what impact it has on performance. Has anyone done similar before? ----------------- Now before posting the above, I went back to chart the effect of compression... strSizes := Dictionary new. zipSizes := Dictionary new. zipStrings do: [ :zs | strSizes at: zs uncompressedSize accumulate: zs uncompressedSize. zipSizes at: zs uncompressedSize accumulate: zs compressedSize ]. strSizes keys sorted do: [ :strSize | Transcript cr; show: ( strSize printStringLength: 10); show: ((strSizes at: strSize) printStringLength: 10); show: ((zipSizes at: strSize) printStringLength: 10)] and loaded that output into Excel to produce the attached graph, which shows its detrimental for strings below 100 bytes, and limited benefit above that until the very last data point. For comparison, here are the last two data points (largest strings). 74498 74498 13781 1621978 1621978 234281 and its content on that largest string starts like this... 0000;<control>;Cc;0;BN;;;;;N;NULL;;;; 0001;<control>;Cc;0;BN;;;;;N;START OF HEADING;;;; 0002;<control>;Cc;0;BN;;;;;N;START OF TEXT;;;; 0003;<control>;Cc;0;BN;;;;;N;END OF TEXT;;;; 0004;<control>;Cc;0;BN;;;;;N;END OF TRANSMISSION;;;; 0005;<control>;Cc;0;BN;;;;;N;ENQUIRY;;;; 0006;<control>;Cc;0;BN;;;;;N;ACKNOWLEDGE;;;; 0007;<control>;Cc;0;BN;;;;;N;BELL;;;; 0008;<control>;Cc;0;BN;;;;;N;BACKSPACE;;;; 0009;<control>;Cc;0;S;;;;;N;CHARACTER TABULATION;;;; 000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;; 000B;<control>;Cc;0;S;;;;;N;LINE TABULATION;;;; 000C;<control>;Cc;0;WS;;;;;N;FORM FEED (FF);;;; 000D;<control>;Cc;0;B;;;;;N;CARRIAGE RETURN (CR);;;; Now #pointersTo shows that (apart from the dozens of GT objects grabbing at it) it is held by the "receiver" ivar of OpalCompiler. What does it do there? cheers -ben cheers -ben > 2.7 MB of Bitmaps > 1.8 MB of ByteSymbols > > That sumps up aready ~27 MB > > > >