Hi list!

I am running into a PDF that crashes PDF parsing - unfortunately I haven't
found a PoDoFo tool that invokes the faulty path, it's only happening in my
own tool. Essentially I do this:

for (PoDoFo::TCIVecObjects obj_it=pdf_objects.begin();
        obj_it!=pdf_objects.end(); obj_it++) {

    const PoDoFo::PdfObject* obj = *obj_it;

    [...]

        if (obj->HasStream()) {
            const PoDoFo::PdfStream* obj_stream = obj->GetStream();
            if (obj_stream) {
                char* obj_data = NULL;
                PoDoFo::pdf_long obj_data_len = 0;
                obj_stream->GetFilteredCopy(&obj_data, &obj_data_len);

The last line triggers an assert 0 with the following stack trace:

(gdb) bt
#0  0x00007f68230d7309 in PoDoFo::PdfFilter::~PdfFilter() () from
target:[...]/src/libpodofo.so.0.9.5
#1  0x00007f68230d7418 in PoDoFo::PdfAscii85Filter::~PdfAscii85Filter() ()
from target:[...]/src/libpodofo.so.0.9.5
#2  0x00007f68230d7448 in PoDoFo::PdfAscii85Filter::~PdfAscii85Filter() ()
from target:[...]/src/libpodofo.so.0.9.5
#3  0x00007f68230d3341 in std::auto_ptr<PoDoFo::PdfFilter>::~auto_ptr() ()
from target:[...]/src/libpodofo.so.0.9.5
#4  0x00007f68230d313b in
PoDoFo::PdfFilteredDecodeStream::~PdfFilteredDecodeStream() () from
target:[...]/src/libpodofo.so.0.9.5
#5  0x00007f68230d31ac in
PoDoFo::PdfFilteredDecodeStream::~PdfFilteredDecodeStream() () from
target:[...]/src/libpodofo.so.0.9.5
#6  0x00007f68230f3a8d in
std::auto_ptr<PoDoFo::PdfOutputStream>::~auto_ptr() () from
target:[...]/src/libpodofo.so.0.9.5
#7  0x00007f68230f2b63 in PoDoFo::PdfStream::GetFilteredCopy(char**, long*)
const () from target:[...]/src/libpodofo.so.0.9.5
#8  0x00007f681ffdfa7b in parse_string (self=0x0, args=0x7f68234c62d0,
keywds=0x0) at [...]/mycode.cc:843

As before, I unfortunately cannot share the file, but after some tracing, I
believe the problem is that PdfStream::GetFilteredCopy is creating "nested"
output stream via PdfFilterFactory::CreateDecodeStream . With "nested" I
mean that the stream contains "owned" output streams generated in this loop:

    PdfFilteredDecodeStream* pFilter = new PdfFilteredDecodeStream(
pStream, *it, false, pDictionary );
    ++it;

    while( it != filters.rend() )
    {
        pFilter = new PdfFilteredDecodeStream( pFilter, *it, true,
pDictionary );
        ++it;
    }

    return pFilter;

That is, we enter the loop and create the nested pFilter. This object, from
what I can see, is never being closed, meaning that when the auto_ptr calls
the destructor, we hit this assert:

PdfFilter::~PdfFilter()
{
    // Whoops! Didn't call EndEncode() before destroying the filter!
    // Note that we can't do this for the user, since EndEncode() might
    // throw and we can't safely have that in a dtor. That also means
    // we can't throw here, but must abort.
    assert( !m_pOutputStream );

Without a full understanding of the code, this may be more guessing than a
good pointer, but it seems we may not be calling EndDecode when writing
fails (of if it is never triggered on the nested output stream).

I'm happy to do any debugging you need, but hopefully the stack-trace above
helps identify the issue.
Also, if I'm invoking GetFilteredCopy in a way I shouldn't and the bug is
on my side, sorry! Please let me know so I can go, be ashamed of myself for
opening this bug, and fix my tool ;-)

Thanks!
-Clemens

-- 

*Clemens Kolbitsch*
Director of Engineering, Lastline Cloud and Infrastructure

www.lastline.com

Lastline
Advanced Network Security | AI Done Right
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to