Hi there,

I found a memory leak bug in function PdfMemDocument::Load() when fuzzing
the podofotxtextract tool. Attached is the PoC reproducing the bug.

Valgrind's output of this PoC is as follows:

root@ef3a73316728:/data/podofo-code-1849-podofo-trunk/build/crashes#
valgrind --leak-check=full ../tools/podofotxtextract/podofotxtextract
_xref-InvalidDataType-PdfParser-230-PdfVariant-865
==55== Memcheck, a memory error detector
==55== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==55== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==55== Command: ../tools/podofotxtextract/podofotxtextract
_xref-InvalidDataType-PdfParser-230-PdfVariant-865
==55==
WARNING: PDF Standard Violation: No /Size key was specified in the trailer
directory. Will attempt to recover.WARNING: There are more objects (700000)
in this XRef table than specified in the size key of the trailer directory
(0)!
Error: An error 20 ocurred during processing the pdf file.


PoDoFo encountered an error. Error: 20 ePdfError_InvalidDataType
Callstack:
#0 Error Source: /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:
230
Information: Unable to load objects from file.
#1 Error Source: /data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:
865


==55==
==55== HEAP SUMMARY:
==55==     in use at exit: 16,878,112 bytes in 7 blocks
==55==   total heap usage: 140 allocs, 133 frees, 22,497,119 bytes allocated
==55==
==55== 16,805,408 (696 direct, 16,804,712 indirect) bytes in 1 blocks are
definitely lost in loss record 7 of 7
==55==    at 0x4C2E0EF: operator new(unsigned long) (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==55==    by 0x4D2D9D: PoDoFo::PdfMemDocument::Load(char const*, bool)
(PdfMemDocument.cpp:255)
==55==    by 0x4D2BE1: PoDoFo::PdfMemDocument::PdfMemDocument(char const*,
bool) (PdfMemDocument.cpp:102)
==55==    by 0x43B5EC: TextExtractor::Init(char const*)
(TextExtractor.cpp:41)
==55==    by 0x44040A: main (podofotxtextract.cpp:52)
==55==
==55== LEAK SUMMARY:
==55==    definitely lost: 696 bytes in 1 blocks
==55==    indirectly lost: 16,804,712 bytes in 5 blocks
==55==      possibly lost: 0 bytes in 0 blocks
==55==    still reachable: 72,704 bytes in 1 blocks
==55==         suppressed: 0 bytes in 0 blocks
==55== Reachable blocks (those to which a pointer was found) are not shown.
==55== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==55==
==55== For counts of detected and suppressed errors, rerun with: -v
==55== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Based on the following debug infomation I got, It seems like this malformed
pdf file causes PoDoFo::PdfParser::ParseFile() to throw an exception and
exit directly, without deleting the memory allocated in
PdfMemDocument::Load().

Breakpoint 3, PoDoFo::PdfParser::ParseFile (this=0x8b3a20, rDevice=...,
bLoadOnDemand=true)
    at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:205
205     Clear();
(gdb)
Continuing.
WARNING: PDF Standard Violation: No /Size key was specified in the trailer
directory. Will attempt to recover.WARNING: There are more objects (700000)
in this XRef table than specified in the size key of the trailer directory
(0)!

Breakpoint 4, PoDoFo::PdfVariant::GetDictionary (this=0x8b3110) at
/data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:852
852     return GetDictionary_NoDL();
(gdb) bt
#0  PoDoFo::PdfVariant::GetDictionary (this=0x8b3110) at
/data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:852
#1  PoDoFo::PdfParser::ReadObjects (this=<optimized out>) at
/data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:971
#2  0x000000000055941f in PoDoFo::PdfParser::ParseFile (this=0x8b3a20,
rDevice=..., bLoadOnDemand=true)
    at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:218
#3  0x0000000000558709 in PoDoFo::PdfParser::ParseFile (this=0x8b3a20,
    pszFilename=0x7fffffffe8a8
"_xref-InvalidDataType-PdfParser-230-PdfVariant-865",
bLoadOnDemand=true)
    at /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:164
#4  0x00000000004d2de5 in PoDoFo::PdfMemDocument::Load (this=0x7fffffffe398,
    pszFilename=0x7fffffffe8a8
"_xref-InvalidDataType-PdfParser-230-PdfVariant-865",
bForUpdate=<optimized out>)
    at /data/podofo-code-1849-podofo-trunk/src/doc/PdfMemDocument.cpp:256
#5  0x00000000004d2be2 in PoDoFo::PdfMemDocument::PdfMemDocument
(this=0x7fffffffe398,
    pszFilename=0x7ffff69e7b30 <main_arena+16> "\240\065\213",
bForUpdate=false)
    at /data/podofo-code-1849-podofo-trunk/src/doc/PdfMemDocument.cpp:102
#6  0x000000000043b5ed in TextExtractor::Init (this=0x7fffffffe578,
pszInput=0x7ffff69e7b30 <main_arena+16> "\240\065\213")
    at /data/podofo-code-1849-podofo-trunk/tools/podofotxtextract/
TextExtractor.cpp:41
#7  0x000000000044040b in main (argc=2, argv=<optimized out>)
    at /data/podofo-code-1849-podofo-trunk/tools/podofotxtextract/
podofotxtextract.cpp:52
(gdb) n
Single stepping until exit from function PoDoFo::PdfParser::ReadObjects(),
which has no line number information.
Error: An error 20 ocurred during processing the pdf file.


PoDoFo encountered an error. Error: 20 ePdfError_InvalidDataType
Callstack:
#0 Error Source: /data/podofo-code-1849-podofo-trunk/src/base/PdfParser.cpp:
230
Information: Unable to load objects from file.
#1 Error Source: /data/podofo-code-1849-podofo-trunk/src/base/PdfVariant.h:
865


Thanks,

------------------
Liang Cheng

Institute of Software, Chinese Academy of Sciences
4# South Fourth Street, Zhongguancun
Beijing 100190, China

Attachment: _xref-InvalidDataType-PdfParser-230-PdfVariant-865
Description: Binary data

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to