Hi there,

I found a memory leak bug in function PdfMemDocument::Load() when fuzzing the 
podofotxtextract tool. Attached is PoC that reproduces the bug. 


Valgrind's output of this PoC is as follows:


root@ef3a73316728:/data/podofo-code-1849-podofo-trunk/build/crashes# valgrind 
--leak-check=full ../tools/podofotxtextract/podofotxtextract 
_xref-InvalidDataType-PdfParser-230-PdfVariant-865
==55== Memcheck, a memory error detector
==55== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==55== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==55== Command: ../tools/podofotxtextract/podofotxtextract 
_xref-InvalidDataType-PdfParser-230-PdfVariant-865
==55==
WARNING: PDF Standard Violation: No /Size key was specified in the trailer 
directory. Will attempt to recover.WARNING: There are more objects (700000) in 
this XRef table than specified in the size key of the trailer directory (0)!
Error: An error 20 ocurred during processing the pdf file.




PoDoFo encountered an error. Error: 20 ePdfError_InvalidDataType
        Callstack:
        #0 Error Source: 
/data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:230
                Information: Unable to load objects from file.
        #1 Error Source: 
/data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:865




==55==
==55== HEAP SUMMARY:
==55==     in use at exit: 16,878,112 bytes in 7 blocks
==55==   total heap usage: 140 allocs, 133 frees, 22,497,119 bytes allocated
==55==
==55== 16,805,408 (696 direct, 16,804,712 indirect) bytes in 1 blocks are 
definitely lost in loss record 7 of 7
==55==    at 0x4C2E0EF: operator new(unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==55==    by 0x4D2D9D: PoDoFo::PdfMemDocument::Load(char const*, bool) 
(PdfMemDocument.cpp:255)
==55==    by 0x4D2BE1: PoDoFo::PdfMemDocument::PdfMemDocument(char const*, 
bool) (PdfMemDocument.cpp:102)
==55==    by 0x43B5EC: TextExtractor::Init(char const*) (TextExtractor.cpp:41)
==55==    by 0x44040A: main (podofotxtextract.cpp:52)
==55==
==55== LEAK SUMMARY:
==55==    definitely lost: 696 bytes in 1 blocks
==55==    indirectly lost: 16,804,712 bytes in 5 blocks
==55==      possibly lost: 0 bytes in 0 blocks
==55==    still reachable: 72,704 bytes in 1 blocks
==55==         suppressed: 0 bytes in 0 blocks
==55== Reachable blocks (those to which a pointer was found) are not shown.
==55== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==55==
==55== For counts of detected and suppressed errors, rerun with: -v
==55== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)


Based on the following debug infomation I got, It seems like this malformed pdf 
file causes PoDoFo::PdfParser::ParseFile() to throw an exception and exit 
directly, without deleting the memory allocated in PdfMemDocument::Load().


Breakpoint 3, PoDoFo::PdfParser::ParseFile (this=0x8b3a20, rDevice=..., 
bLoadOnDemand=true)
    at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:205
205         Clear();
(gdb)
Continuing.
WARNING: PDF Standard Violation: No /Size key was specified in the trailer 
directory. Will attempt to recover.WARNING: There are more objects (700000) in 
this XRef table than specified in the size key of the trailer directory (0)!


Breakpoint 4, PoDoFo::PdfVariant::GetDictionary (this=0x8b3110) at 
/data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:852
852         return GetDictionary_NoDL();
(gdb) bt
#0  PoDoFo::PdfVariant::GetDictionary (this=0x8b3110) at 
/data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:852
#1  PoDoFo::PdfParser::ReadObjects (this=<optimized out>) at 
/data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:971
#2  0x000000000055941f in PoDoFo::PdfParser::ParseFile (this=0x8b3a20, 
rDevice=..., bLoadOnDemand=true)
    at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:218
#3  0x0000000000558709 in PoDoFo::PdfParser::ParseFile (this=0x8b3a20,
    pszFilename=0x7fffffffe8a8 
"_xref-InvalidDataType-PdfParser-230-PdfVariant-865", bLoadOnDemand=true)
    at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:164
#4  0x00000000004d2de5 in PoDoFo::PdfMemDocument::Load (this=0x7fffffffe398,
    pszFilename=0x7fffffffe8a8 
"_xref-InvalidDataType-PdfParser-230-PdfVariant-865", bForUpdate=<optimized 
out>)
    at /data/podofo-code-1849-podofo-trunk/src/doc/PdfMemDocument.cpp:256
#5  0x00000000004d2be2 in PoDoFo::PdfMemDocument::PdfMemDocument 
(this=0x7fffffffe398,
    pszFilename=0x7ffff69e7b30 <main_arena+16> "\240\065\213", bForUpdate=false)
    at /data/podofo-code-1849-podofo-trunk/src/doc/PdfMemDocument.cpp:102
#6  0x000000000043b5ed in TextExtractor::Init (this=0x7fffffffe578, 
pszInput=0x7ffff69e7b30 <main_arena+16> "\240\065\213")
    at 
/data/podofo-code-1849-podofo-trunk/tools/podofotxtextract/TextExtractor.cpp:41
#7  0x000000000044040b in main (argc=2, argv=<optimized out>)
    at 
/data/podofo-code-1849-podofo-trunk/tools/podofotxtextract/podofotxtextract.cpp:52
(gdb) n
Single stepping until exit from function PoDoFo::PdfParser::ReadObjects(),
which has no line number information.
Error: An error 20 ocurred during processing the pdf file.




PoDoFo encountered an error. Error: 20 ePdfError_InvalidDataType
        Callstack:
        #0 Error Source: 
/data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:230
                Information: Unable to load objects from file.
        #1 Error Source: 
/data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:865





Thanks,


------------------
Liang Cheng


Institute of Software, Chinese Academy of Sciences
4# South Fourth Street, Zhongguancun
Beijing 100190, China

Attachment: _xref-InvalidDataType-PdfParser-230-PdfVariant-865
Description: Binary data

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to