Hi Dom, Zyx
I’ve been looking at PoDoFo memory usage on large documents.
The PDF spec is 8.7 MB on disk, but uses around 200 MB of RAM when loaded into
a PdfMemDocument
http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf
Memory usage is:
850,000 PdfNames using about 70 MB, which are mostly PdfDictionary keys
125,000 PdfObjects using about 10 MB
A lot of the PdfNames are duplicated dictionary keys appearing in most/all
objects (e.g. “Kids”, “Length”, “Parent” etc)
Eliminating the duplication should save a lot of memory:
- Create a single document name table, something like
std::map< std::string , PdfName > m_nameTable;
- Change TKeyMap from
typedef std::map<PdfName,PdfObject*> TKeyMap; // stores PdfName in every
object key : 36 bytes for sizeof(PdfName) + 24 bytes HeapAlloc overhead +
PdfName::m_Data.length()
to
typedef std::map<PdfName&,PdfObject*> TKeyMap; // stores reference (4 or 8
byte pointer) in every object key
- When keys are added to a PdfDictionary, add them to the document
name table if they don’t exist, then add the PdfName& reference to TKeyMap
(referencing a document name table entry)
This should reduce memory usage for PdfName from 70 MB to about 4MB in
PDF32000_2008.pdf
Is this worth doing? Can you think of any problems this might cause?
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users