Hi there, I found a memory leak bug in function PdfMemDocument::Load() when fuzzing the podofotxtextract tool. Attached is the PoC reproducing the bug.
Valgrind's output of this PoC is as follows: root@ef3a73316728:/data/podofo-code-1849-podofo-trunk/build/crashes# valgrind --leak-check=full ../tools/podofotxtextract/podofotxtextract _xref-InvalidDataType-PdfParser-230-PdfVariant-865 ==55== Memcheck, a memory error detector ==55== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==55== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==55== Command: ../tools/podofotxtextract/podofotxtextract _xref-InvalidDataType-PdfParser-230-PdfVariant-865 ==55== WARNING: PDF Standard Violation: No /Size key was specified in the trailer directory. Will attempt to recover.WARNING: There are more objects (700000) in this XRef table than specified in the size key of the trailer directory (0)! Error: An error 20 ocurred during processing the pdf file. PoDoFo encountered an error. Error: 20 ePdfError_InvalidDataType Callstack: #0 Error Source: /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp: 230 Information: Unable to load objects from file. #1 Error Source: /data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h: 865 ==55== ==55== HEAP SUMMARY: ==55== in use at exit: 16,878,112 bytes in 7 blocks ==55== total heap usage: 140 allocs, 133 frees, 22,497,119 bytes allocated ==55== ==55== 16,805,408 (696 direct, 16,804,712 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 7 ==55== at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==55== by 0x4D2D9D: PoDoFo::PdfMemDocument::Load(char const*, bool) (PdfMemDocument.cpp:255) ==55== by 0x4D2BE1: PoDoFo::PdfMemDocument::PdfMemDocument(char const*, bool) (PdfMemDocument.cpp:102) ==55== by 0x43B5EC: TextExtractor::Init(char const*) (TextExtractor.cpp:41) ==55== by 0x44040A: main (podofotxtextract.cpp:52) ==55== ==55== LEAK SUMMARY: ==55== definitely lost: 696 bytes in 1 blocks ==55== indirectly lost: 16,804,712 bytes in 5 blocks ==55== possibly lost: 0 bytes in 0 blocks ==55== still reachable: 72,704 bytes in 1 blocks ==55== suppressed: 0 bytes in 0 blocks ==55== Reachable blocks (those to which a pointer was found) are not shown. ==55== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==55== ==55== For counts of detected and suppressed errors, rerun with: -v ==55== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Based on the following debug infomation I got, It seems like this malformed pdf file causes PoDoFo::PdfParser::ParseFile() to throw an exception and exit directly, without deleting the memory allocated in PdfMemDocument::Load(). Breakpoint 3, PoDoFo::PdfParser::ParseFile (this=0x8b3a20, rDevice=..., bLoadOnDemand=true) at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:205 205 Clear(); (gdb) Continuing. WARNING: PDF Standard Violation: No /Size key was specified in the trailer directory. Will attempt to recover.WARNING: There are more objects (700000) in this XRef table than specified in the size key of the trailer directory (0)! Breakpoint 4, PoDoFo::PdfVariant::GetDictionary (this=0x8b3110) at /data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:852 852 return GetDictionary_NoDL(); (gdb) bt #0 PoDoFo::PdfVariant::GetDictionary (this=0x8b3110) at /data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:852 #1 PoDoFo::PdfParser::ReadObjects (this=<optimized out>) at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:971 #2 0x000000000055941f in PoDoFo::PdfParser::ParseFile (this=0x8b3a20, rDevice=..., bLoadOnDemand=true) at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:218 #3 0x0000000000558709 in PoDoFo::PdfParser::ParseFile (this=0x8b3a20, pszFilename=0x7fffffffe8a8 "_xref-InvalidDataType-PdfParser-230-PdfVariant-865", bLoadOnDemand=true) at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:164 #4 0x00000000004d2de5 in PoDoFo::PdfMemDocument::Load (this=0x7fffffffe398, pszFilename=0x7fffffffe8a8 "_xref-InvalidDataType-PdfParser-230-PdfVariant-865", bForUpdate=<optimized out>) at /data/podofo-code-1849-podofo-trunk/src/doc/PdfMemDocument.cpp:256 #5 0x00000000004d2be2 in PoDoFo::PdfMemDocument::PdfMemDocument (this=0x7fffffffe398, pszFilename=0x7ffff69e7b30 <main_arena+16> "\240\065\213", bForUpdate=false) at /data/podofo-code-1849-podofo-trunk/src/doc/PdfMemDocument.cpp:102 #6 0x000000000043b5ed in TextExtractor::Init (this=0x7fffffffe578, pszInput=0x7ffff69e7b30 <main_arena+16> "\240\065\213") at /data/podofo-code-1849-podofo-trunk/tools/podofotxtextract/ TextExtractor.cpp:41 #7 0x000000000044040b in main (argc=2, argv=<optimized out>) at /data/podofo-code-1849-podofo-trunk/tools/podofotxtextract/ podofotxtextract.cpp:52 (gdb) n Single stepping until exit from function PoDoFo::PdfParser::ReadObjects(), which has no line number information. Error: An error 20 ocurred during processing the pdf file. PoDoFo encountered an error. Error: 20 ePdfError_InvalidDataType Callstack: #0 Error Source: /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp: 230 Information: Unable to load objects from file. #1 Error Source: /data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h: 865 Thanks, ------------------ Liang Cheng Institute of Software, Chinese Academy of Sciences 4# South Fourth Street, Zhongguancun Beijing 100190, China
_xref-InvalidDataType-PdfParser-230-PdfVariant-865
Description: Binary data
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users