Re: [Podofo-users] Future ABI stability of PoDoFo

2022-05-10 Thread Mark Rogers
Hi

I don't think any of the main compilers have complete C++20 support yet, and 
defects are still being found in the C++20 standard so it's not completely 
stable: 

GCC partial C++20 support
https://gcc.gnu.org/projects/cxx-status.html#cxx20 

Clang partial C++20 support
https://clang.llvm.org/cxx_status.html#cxx20 

VC++ partial C++20 support
https://devblogs.microsoft.com/cppblog/msvc-cpp20-and-the-std-cpp20-switch/ 

Additionally some defects in the C++20 standard are still being discovered:
https://devblogs.microsoft.com/cppblog/msvc-cpp20-and-the-std-cpp20-switch/#iso-c20-continuing-work-defect-reports-and-clarifications

It might be ok to use selected C++20 features, but how easy is it to identify 
which parts of the C++20 standard are stable and are available across the main 
compilers?

Best Regards
Mark

-- 
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
 


On 05/05/2022, 08:43, "Francesco Pretto"  wrote:

Hi Christopher,

On Thu, 5 May 2022 at 08:16, Christopher Creutzig
 wrote:
>
> Backporting required parts of std::span and having a layer that switches 
between std::format and fmtlib shouldn’t be too hard, assuming these are only 
used internally and nothing in the API expects span inputs, for example. But 
would we require contributors to use a C++20 enabled compiler?
>
>

Please note that what you suggest it's already in place in pdfmm: I
use fmtlib in place of std::format and a backported std::span.

Thinking a little bit more about the topic, let's not argue too much
about this: for the API/ABI surface I will just stick to C++17
requirement and hide the fact I'm using std::format and std::span.

Cheers,
Francesco


___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patch for CVE-2018-20797

2022-04-11 Thread Mark Rogers
Hi

Here’s a patch for CVE-2018-20797.

The problem occurs in the PdfPredictorDecoder constructor which calls 
podofo_calloc to allocate a buffer based on values in the pDecodeParms 
dictionary, which are multiplied together to produce a buffer size passed to 
podofo_calloc

m_nRows = (m_nColumns * m_nColors * m_nBPC) >> 3;

If any of these values are negative, then m_nRows is negative and turns into a 
large positive value when passed as unsigned size_t to podofo_calloc.

A related problem is caused when large positive values in pDecodeParms overflow 
when multiplied together so produce the wrong buffer size (e.g. if nColumns=1, 
m_nBPC=2 and m_nColors=SIZE_MAX/2+1).

This has been tested in production for a few months on Mac 64-bit / Windows 
32-bit.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



patch-CVE-2018-20797.diff
Description: patch-CVE-2018-20797.diff
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

2022-02-02 Thread Mark Rogers
Yes, I’d considered a new ePdfError value but that needs more discussion since 
it affects the ABI. Even without the patch there are currently cases where 
ePdfError_InvalidXRef is thrown but an invalid xref isn’t the problem (e.g. 
trailer loops)

A new ePdfError value would break client code that has special handling when 
ePdfError_InvalidXRef is caught. Apart from the PoDoFo unit tests, the most 
likely case is client code mapping  ePdfError_InvalidXRef  to an error message 
like “This PDF is corrupt”.

Possible options:


  1.  Add a new error code like ePdfError_RecursionTooDeep and throw that 
instead of ePdfError_InvalidXRef in the recursion guard – that might break 
client code as discussed above:
  ePdfError_RecursionTooDeep,   /* recursion deeper than 
s_maxRecursionDepth */



  1.  Keep ePdfError_InvalidXRef  and document that it’s not always an invalid 
XRef:
ePdfError_InvalidXRef,  /* The XRef table is invalid or recursion 
is too deep */



  1.  Don’t think replacing ePdfError_InvalidXRef completely is option since 
that gets thrown invalid xrefs and recursion isn’t involved

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



From: Michal Sudolsky 
Date: Wednesday, 2 February 2022 at 19:38
To: "PowerMapper.com" 
Cc: Christopher Creutzig , 
"podofo-users@lists.sourceforge.net" 
Subject: Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs


+ if ( s_nRecursionDepth > s_maxRecursionDepth )

+ {

+ // avoid stack overflow on documents that have circular cross references, 
loops

+ // or very deeply nested structures, can happen with

+ // /Prev entries in trailer and XRef streams (possible via a chain of entries 
with a loop)

+ // /Kids entries that loop back to self or parent

+ // deeply nested Dictionary or Array objects (possible with lots of 
 brackets)

+ // mutually recursive loops involving several objects are possible

+ PODOFO_RAISE_ERROR( ePdfError_InvalidXRef );

+ }



Not all these cases are invalid xref errors.

On Wed, Feb 2, 2022 at 7:16 PM Mark Rogers 
mailto:mark.rog...@powermapper.com>> wrote:
Hi Everyone

Here are patches for recursive stack consumption, which should fix 
CVE-2018-8002, CVE-2021-30470, CVE-2021-30471,  CVE-2020-18971

This works by refactoring the recursion guard and making it a nested class of 
PdfTokenizer (as it’s mostly used by the tokenizer and parser). As agreed 
earlier in this thread the patch means that PoDoFo requires C++ 11 if compiled 
with PODOFO_MULTI_THREAD

The patch has been tested against the CVE PoC files, and the new unit tests. 
It’s also been tested in production for 2 months on macOS (64-bit) and Windows 
(32-bit)

We haven’t tested on Linux. This might be relevant for the 
ParserTest::getStackOverflowDepth() unit test method which calculates an 
overflow depth for each platform that causes stack overflow without exhausting 
the heap (although the calculation should be the same as macOS since they both 
use the same System V AMD64 ABI).

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com<http://www.powermapper.com>
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL





From: Michal Sudolsky mailto:sudols...@gmail.com>>
Date: Thursday, 25 November 2021 at 18:25
To: Christopher Creutzig mailto:ccreu...@mathworks.com>>
Cc: 
"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>" 
mailto:podofo-users@lists.sourceforge.net>>
Subject: Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs



On Thu, Nov 25, 2021 at 4:58 PM Christopher Creutzig 
mailto:ccreu...@mathworks.com>> wrote:
> Ok, I can submit a patch which uses C++11 thread_local when 
> PODOFO_MULTI_THREAD is defined. The recursion guard definition will look like 
> this:
>
> #if defined(PODOFO_MULTI_THREAD)
>  static int thread_local s_nRecursionDepth; // PoDoFo with threading support 
> requires C++11 compiler with thread_local
> #else
>  static int  s_nRecursionDepth;  // PoDoFo is single threaded
> #endif
>
> Does that work for everyone?

Looks good to me, and the comment is hopefully explanation enough if anyone 
runs into a compile time error. Please do include a doc patch stating the 
requirement.

Can we get a macro that creates this thread-local integer and the recursion 
guard object all in one go, with the connotation that the recursion guard is 
meant to usually be applied to each affected function entry point separately? 
(Unless that is not what it is meant to do. I think we could just as well make 
an argument for a single recursion depth counter per thread, which then 
probably should become a

Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

2022-02-02 Thread Mark Rogers
Hi Everyone

Here are patches for recursive stack consumption, which should fix 
CVE-2018-8002, CVE-2021-30470, CVE-2021-30471,  CVE-2020-18971

This works by refactoring the recursion guard and making it a nested class of 
PdfTokenizer (as it’s mostly used by the tokenizer and parser). As agreed 
earlier in this thread the patch means that PoDoFo requires C++ 11 if compiled 
with PODOFO_MULTI_THREAD

The patch has been tested against the CVE PoC files, and the new unit tests. 
It’s also been tested in production for 2 months on macOS (64-bit) and Windows 
(32-bit)

We haven’t tested on Linux. This might be relevant for the 
ParserTest::getStackOverflowDepth() unit test method which calculates an 
overflow depth for each platform that causes stack overflow without exhausting 
the heap (although the calculation should be the same as macOS since they both 
use the same System V AMD64 ABI).

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL





From: Michal Sudolsky 
Date: Thursday, 25 November 2021 at 18:25
To: Christopher Creutzig 
Cc: "podofo-users@lists.sourceforge.net" 
Subject: Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs



On Thu, Nov 25, 2021 at 4:58 PM Christopher Creutzig 
mailto:ccreu...@mathworks.com>> wrote:
> Ok, I can submit a patch which uses C++11 thread_local when 
> PODOFO_MULTI_THREAD is defined. The recursion guard definition will look like 
> this:
>
> #if defined(PODOFO_MULTI_THREAD)
>  static int thread_local s_nRecursionDepth; // PoDoFo with threading support 
> requires C++11 compiler with thread_local
> #else
>  static int  s_nRecursionDepth;  // PoDoFo is single threaded
> #endif
>
> Does that work for everyone?

Looks good to me, and the comment is hopefully explanation enough if anyone 
runs into a compile time error. Please do include a doc patch stating the 
requirement.

Can we get a macro that creates this thread-local integer and the recursion 
guard object all in one go, with the connotation that the recursion guard is 
meant to usually be applied to each affected function entry point separately? 
(Unless that is not what it is meant to do. I think we could just as well make 
an argument for a single recursion depth counter per thread, which then 
probably should become a static member of the recursion guard class.)

I think that it was meant that there will be just a single recursion counter 
per thread.



Cheers,
Christopher

The MathWorks GmbH | Friedlandstr.18 | 52064 Aachen | District Court Aachen | 
HRB 8082 | Managing Directors: Bertrand Dissler, Steven D. Barbo, Jeanne O’Keefe



From: Mark Rogers 
mailto:mark.rog...@powermapper.com>>
Sent: Thursday, November 25, 2021 16:33
To: Christopher Creutzig 
mailto:ccreu...@mathworks.com>>; Michal Sudolsky 
mailto:sudols...@gmail.com>>
Cc: 
podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>
Subject: Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

>>> I like this idea. As PODOFO_MULTI_THREAD will enable C++11
>>Is there a chance we might get there? Who would be able to make that decision?
>consider the decision being made. Again, from my point of view. In
>other words, feel free to provide a patch with the suggested changes.

Ok, I can submit a patch which uses C++11 thread_local when PODOFO_MULTI_THREAD 
is defined. The recursion guard definition will look like this:

#if defined(PODOFO_MULTI_THREAD)
  static int thread_local s_nRecursionDepth; // PoDoFo with threading support 
requires C++11 compiler with thread_local
#else
  static int  s_nRecursionDepth;  // PoDoFo is single threaded
#endif

Does that work for everyone?

> Not when the user of podofo already used some 64 KB before calling podofo. To 
> me it seems more reasonable to use a more conservative value which would not 
> consume more than some half (or tenth?) of the available stack in the worst 
> case.

I’ll also reduce the 500 max recursion depth as suggested (probably to 256)

And I’ll also include the new parser unit tests which test for deep recursion 
and reference loops

We’re also testing a patch for CVE-2018-20797. This is caused by an invalid 
negative value for one of the FlateDecode compression parameters which results 
in a call to podofo_calloc( -14 ) == podofo_calloc( 0xfff2 )

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com<http://www.powermapper.com>
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



From: Christopher Creutzig 
mailto:ccreu...@mathworks.com>>
Date: Thursday, 25 November 2021 at 07:16
To: Michal Sudolsky mailto:sudols...@gmail.com>>
Cc: "PowerMapper.com" 
mailto:mark.rog...

Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

2021-11-25 Thread Mark Rogers
>>> I like this idea. As PODOFO_MULTI_THREAD will enable C++11
>>Is there a chance we might get there? Who would be able to make that decision?
>consider the decision being made. Again, from my point of view. In
>other words, feel free to provide a patch with the suggested changes.

Ok, I can submit a patch which uses C++11 thread_local when PODOFO_MULTI_THREAD 
is defined. The recursion guard definition will look like this:

#if defined(PODOFO_MULTI_THREAD)
  static int thread_local s_nRecursionDepth; // PoDoFo with threading support 
requires C++11 compiler with thread_local
#else
  static int  s_nRecursionDepth;  // PoDoFo is single threaded
#endif

Does that work for everyone?

> Not when the user of podofo already used some 64 KB before calling podofo. To 
> me it seems more reasonable to use a more conservative value which would not 
> consume more than some half (or tenth?) of the available stack in the worst 
> case.

I’ll also reduce the 500 max recursion depth as suggested (probably to 256)

And I’ll also include the new parser unit tests which test for deep recursion 
and reference loops

We’re also testing a patch for CVE-2018-20797. This is caused by an invalid 
negative value for one of the FlateDecode compression parameters which results 
in a call to podofo_calloc( -14 ) == podofo_calloc( 0xfff2 )

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



From: Christopher Creutzig 
Date: Thursday, 25 November 2021 at 07:16
To: Michal Sudolsky 
Cc: "PowerMapper.com" , 
"podofo-users@lists.sourceforge.net" 
Subject: RE: [Podofo-users] PoDoFo and recursive stack consumption CVEs

>> If we want to avoid UB in the multithreaded world, I’m afraid we will have 
>> to make a C++11 compiler a requirement, as C++03 never acknowledged the 
>> existence of threads. (That is not limited to this place, a lot of methods 
>> like PdfEncodingFactory::GlobalPdfRomanEncodingInstance are not currently 
>> threadsafe in C++03, as discussed earlier.)
> That is not thread-safe even in C++11.

True, but C++11 or later would give us the tools to make it thread-safe.

> Except that some things are not so available as threads like for example 
> thread_local and atomic operations.

thread_local equivalents are available for g++, clang, and MSVC. That covers 
the compilers listed in 
https://github.com/NickNaso/PoDoFo/blob/master/README.md#installation_with_cmake.
 See my proposed PODOFO_THREAD_LOCAL in 
https://sourceforge.net/p/podofo/mailman/message/37389082/.

> I like this idea. As PODOFO_MULTI_THREAD will enable C++11

Is there a chance we might get there? Who would be able to make that decision?


Cheers,
Christopher

The MathWorks GmbH | Friedlandstr.18 | 52064 Aachen | District Court Aachen | 
HRB 8082 | Managing Directors: Bertrand Dissler, Steven D. Barbo, Jeanne O’Keefe

___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

2021-11-22 Thread Mark Rogers


>Do we agree this would implement different functionality, with the potential 
>of hard to debug sporadic effects depending on how the threads actually run?
>
Yes, although this already happens with PoDoFo – all the PoDoFo mutex methods 
are defined as no-ops unless  PODOFO_MULTI_THREAD is defined.

Personally, I’d be much happier if PoDoFo only supported modern compilers, but 
it’s hard to know which compilers people actually use (especially for embedded 
systems). One of the reasons for the PdfMM fork was the inability to use modern 
C++ in PoDoFo.

I don’t think it’s helpful that PoDoFo supports very old compilers like Visual 
C++ 6 and  Visual Studio 2005 with known security vulnerabilities (VS 2005 
reached end of life in 2016 and no longer receives security updates). Shipping 
code built using a complier containing unpatched security vulnerabilities is 
fundamentally unsafe. Even testing the build system still works with old 
compilers is potentially dangerous.

Only supporting compilers that still get security updates is a simple way to 
get rid of old compilers, and easy to justify.

> These compilers may or may not implement thread_local as macros,.

I think the standard says it’s a macro:
https://en.cppreference.com/w/c/thread/thread_local

Best Regards
Mark


Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



From: Christopher Creutzig 
Date: Monday, 22 November 2021 at 12:53
To: "podofo-users@lists.sourceforge.net" 
Subject: Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

> The thread_local macro is available in

These compilers may or may not implement thread_local as macros, but 
technically, it is a C++ keyword, and we should not use ifdef(thread_local). I 
think #if __cplusplus >= 201103L would be a more robust option.

> Another option is adding a fallback for very old compilers that uses mutexes.

Do we agree this would implement different functionality, with the potential of 
hard to debug sporadic effects depending on how the threads actually run?

I think something like the following is more promising:

#if __cplusplus >= 201103L || defined(thread_local)
#  define PODOFO_THREAD_LOCAL thread_local
#else
#  ifdef _MSC_VER
#define PODOFO_THREAD_LOCAL __declspec(thread)
#  elif defined(__GNUC__)
#define PODOFO_THREAD_LOCAL __thread
#  else
#error "Unknown old compiler, please add thread local support"
#  endif
#endif


Cheers,
Christopher
The MathWorks GmbH | Friedlandstr.18 | 52064 Aachen | District Court Aachen | 
HRB 8082 | Managing Directors: Bertrand Dissler, Steven D. Barbo, Jeanne O’Keefe


___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

2021-11-22 Thread Mark Rogers
The 500 limit should be enough for a 32-bit release build with a 256 KB stack 
(it would use about 195 KB)

I think there should be a user configurable limit (PdfRecursionGuard:: 
SetMaxRecursionDepth) to override the 500 limit on systems with unusually small 
stacks - some embedded systems have less than 10KB of stack.

Also worth discussing – should it be possible to disable the recursion guard 
completely with SetMaxRecursionDepth(0) ? This is a bad idea with untrusted 
input, but might make sense in some situations.

We’ve written a lot of new parser unit tests to test deeply nested and looping 
PDF structures. We’ll submit these along with a patch - these tests make it 
easy to experiment with different patches for the same issue.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



From: Michal Sudolsky 
Date: Saturday, 20 November 2021 at 22:08
To: "PowerMapper.com" 
Cc: Christopher Creutzig , 
"podofo-users@lists.sourceforge.net" 
Subject: Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

Is option 3) worth investigating?

It seems like the best solution in current circumstances. Another solution 
could be to eliminate all recursion but thread_local recursion counter is 
simpler to do.

#else
// fallback to process global recursion count – overcounts depth if PDFs 
processed in parallel on multiple threads, same result as thread_local if 
process single threaded
#define PODOFO_THREAD_LOCAL
#endif

But this fallback would cause UB unless all access to t_nRecursionDepth is 
atomic or guarded by mutex.

In a release build on x86/x64 each recursive ReadArray call loop uses about 400 
bytes of stack.

  *   Windows IIS 32-bit worker processes – 256 KB max stack (stack overflows 
with 655 ‘[‘ characters)

So would the limit of 500 as default be enough? It should be far enough from 
value which would cause stack overflow.


I had a look at the patch in https://sourceforge.net/p/podofo/tickets/25/#51b9. 
That’s a simpler solution than the changes I proposed for PdfRecursionGuard.

That patch introduces UB (same reason as above).

___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

2021-11-22 Thread Mark Rogers
Hi

I think there are a couple of options here:

Option 1)
Make C++11 a requirement only when PoDoFo is compiled with PODOFO_MULTI_THREAD. 
That means the change won’t affect users who compile PoDoFo single threaded

So the recursion guard definition would look something like this:

#if !defined(PODOFO_MULTI_THREAD)
  static int  s_nRecursionDepth;  // PoDoFo is single threaded
#elif defined(thread_local)
  static int thread_local s_nRecursionDepth; // PoDoFo has threading support 
and using C++11 compiler
#else
#error C++11 thread_local is required for multi-thread
#endif

The thread_local macro is available in

GCC 4.8.0 and later (4.8.0 was released 2013)
clang 3.3 and later (3.3 was released 2013)
clang (macOS fork) (available in XCode 8, which was released in 2016)
Visual Studio 2015 and later

Option 2)
Another option is adding a fallback for very old compilers that uses mutexes. 
This mutex fallback won’t be used by many people (because most will use modern 
compilers that support C++11) but might make it easier to agree on a patch (and 
isn’t hard to implement). It would look something like this:

#if !defined(PODOFO_MULTI_THREAD)
  static int  s_nRecursionDepth;  // PoDoFo is single threaded
#eif defined(thread_local)
  static int thread_local s_nRecursionDepth; // PoDoFo has threading support 
and using C++11 compiler
#else
  static Util::PdfMutex s_guardMutex;
  static int  s_nRecursionDepth;  // PoDoFo is multi threaded and this needs 
protected by a mutex
#endif


PdfRecursionGuard ::~PdfRecursionGuard()
{
#if !defined(PODOFO_MULTI_THREAD)
  --s_nRecursionDepth;  // PoDoFo is single threaded
#elif defined(thread_local)
  --s_nRecursionDepth; // PoDoFo has threading support but this is thread_local
#else
  PdfMutexWrapper lock(s_guardMutex);
  --s_nRecursionDepth;  // PoDoFo is multi threaded and this needs protected by 
a mutex
#endif
}

Best Regards
Mark


Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL




From: Christopher Creutzig 
Date: Monday, 22 November 2021 at 07:45
To: Michal Sudolsky , "PowerMapper.com" 

Cc: "podofo-users@lists.sourceforge.net" 
Subject: RE: [Podofo-users] PoDoFo and recursive stack consumption CVEs

From: Michal Sudolsky sudols...@gmail.com<mailto:sudols...@gmail.com>
> But this fallback would cause UB unless all access to t_nRecursionDepth is 
> atomic or guarded by mutex.

If we want to avoid UB in the multithreaded world, I’m afraid we will have to 
make a C++11 compiler a requirement, as C++03 never acknowledged the existence 
of threads. (That is not limited to this place, a lot of methods like 
PdfEncodingFactory::GlobalPdfRomanEncodingInstance are not currently threadsafe 
in C++03, as discussed earlier.)

Now, if we do that, the solution is easy, make the variables used with the 
recursion guard thread_local. (Atomic or mutex are not a good solution, as 
those would make the recursion guard depend on what other threads are doing at 
the same time.)


Cheers,
Christopher
The MathWorks GmbH | Friedlandstr.18 | 52064 Aachen | District Court Aachen | 
HRB 8082 | Managing Directors: Bertrand Dissler, Steven D. Barbo, Jeanne O’Keefe


___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo and recursive stack consumption CVEs

2021-11-13 Thread Mark Rogers
Hi

Loop detection may be worth doing, but it doesn’t fix the problem when objects 
contain no loops, but are so deeply nested they trigger stack overflow. Valid 
files with no loops, containing deeply nested objects can cause stack overflow. 
This is similar to how valid files with large objects cause out-of-memory 
conditions. This is expected in ISO 32000, and described in Annex C. 3 – Memory 
Limits.

An example is a nested array – if you add enough square brackets to MediaBox 
below it triggers stack overflow because PdfTokenizer::ReadArray is called 
recursively for each ‘[‘ token.

%PDF-1.0
1 0 obj<>endobj 2 0 obj<>endobj 3 0 obj<https://man7.org/linux/man-pages/man3/pthread_create.3.html#NOTES

The same problem happens with dictionaries - Go-To Actions have target 
dictionaries that may be nested recursively with no specified limit (ISO 32000 
12.6.4.4) so a valid file with very deeply nested target dictionaries can cause 
stack overflow due to recursive calls to PdfTokenizer::ReadDictionary.

We have a simple patch for the tokenizer stack overflow issues (adding 
PdfRecursionGuard guard(m_nRecursionDepth) to PdfTokenizer::GetNextVariant) – 
but it needs PdfRecursionGuard moved to a header (like the patch for ticket 25).

The same problem happens with outlines – if they’re nested deeply enough and 
have no loops they still trigger stack overflow in 
PdfOutlineItem::PdfOutlineItem.

I had a look at the patch in https://sourceforge.net/p/podofo/tickets/25/#51b9. 
That’s a simpler solution than the changes I proposed for PdfRecursionGuard.

In addition to the patch for tokenizer recursion I’ll write some new unit tests 
for loops and deeply nested structures

BTW I’m the original author of PdfRecursionGuard - 
https://sourceforge.net/p/podofo/tickets/7/#df09 and the PdfParser unit tests 
https://sourceforge.net/p/podofo/mailman/message/36298123/

Cheers
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL





From: Christopher Creutzig 
Date: Thursday, 28 October 2021 at 07:50
To: Mark Rogers , 
"podofo-users@lists.sourceforge.net" 
Subject: RE: PoDoFo and recursive stack consumption CVEs

From: Mark Rogers 
> tl;dr
> PoDoFo needs a general mechanism to prevent recursive stack consumption 
> because this can happen in many different places in PDFs. Even if the issues 
> above are fixed there will still be other stack overflow issues in PoDoFo.

I agree. But I would like to propose yet another potential solution:

4) Parsing PDF does not create an unbounded number of different objects. Quite 
the opposite: Resolving the same indirect reference for the second time results 
in exactly the same object as the first time; semantically, the PDF file just 
has another edge coming into it.

Therefore, we may want to have a cache of resolved references. If there are 
cases where the same object in the PDF file needs to be represented by multiple 
PoDoFo classes, the cache may need to be able to store these several 
representations. Whether trying to resolve a reference currently under 
construction should throw a “recursive definition” error or provide a handle 
that is going to update as needed is something to discuss.

Does this make sense?


Cheers,
Christopher

The MathWorks GmbH | Friedlandstr.18 | 52064 Aachen | District Court Aachen | 
HRB 8082 | Managing Directors: Bertrand Dissler, Steven D. Barbo, Jeanne O’Keefe


___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PoDoFo and recursive stack consumption CVEs

2021-10-27 Thread Mark Rogers
 the C11 thread_local keyword or one of the older 
equivalents,  but it can be done something like this:

#if defined( thread_local )

// thread_local is a macro which can be used as a feature test

// see https://en.cppreference.com/w/c/thread/thread_local

#define PODOFO_THREAD_LOCAL thread_local

#elif defined(_MSC_VER) && defined( WINVER ) && ( WINVER >= _WIN32_WINNT_VISTA )

// supported on Windows Vista and above (limited support on Windows XP)

#define PODOFO_THREAD_LOCAL __declspec( thread )
#elif defined(__GNUC__) && ( __GNUC__ >= 3 )

// supported on GCC 
https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html#Thread-Local

#define PODOFO_THREAD_LOCAL __thread

#else

// fallback to process global recursion count – overcounts depth if PDFs 
processed in parallel on multiple threads, same result as thread_local if 
process single threaded

#define PODOFO_THREAD_LOCAL

#endif



PdfRecursionGuard()

{

++t_ nRecursionDepth;



if ( t_ nRecursionDepth > s_maxRecursionDepth )

{

PODOFO_RAISE_ERROR( ePdfError_OutOfMemory ); // 
ePdfError_NestedTooDeep might be better

}

}



~PdfRecursionGuard()

{

--t_ nRecursionDepth;

}



static void SetMaxRecursionDepth( int32_t maxRecursionDepth )

{

s_maxRecursionDepth = maxRecursionDepth;

}



static PODOFO_THREAD_LOCAL int t_nRecursionDepth = 0;

static int s_maxRecursionDepth = 500; // user-configurable with reasonable 
default


Adding recursion counting to a method just involves adding the following local 
variable to any method you need to guard:
PdfRecursionGuard guard;


Is option 3) worth investigating?
What does everyone think?

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PodoFo Unit Test patches

2019-04-25 Thread Mark Rogers
Hi

Here are some patches to fix some problems in the PoDoFo unit tests

podofo-GetTempFilename.patch
On Windows the GetTempPath function can return 261 characters (MAX_PATH+1) but 
the buffer assigned inTestUtils.cpp is 256 characters. The patch increases the 
buffer size to 261 characters.

podofo-Arial.patch
The Encryption and Pages Tree unit tests fail if the font ‘Arial’ is not 
installed. This patch changes ‘Arial’ to PODOFO_HPDF_FONT_HELVETICA, one of the 
fonts in PODOFO_BUILTIN_FONTS

Podofo-NoOpenSSL.patch
The Encryption unit tests fail with unexpected exception failures if PoDoFo is 
compiled without OpenSSL support because the PdfEncrypt::CreatePdfEncrypt 
methods throw an ePdfError_NotCompiled exception. This patch checks that 
ePdfError_NotCompiled is thrown when if PODOFO_HAVE_OPENSSL is not defined. The 
patch has been constructed to avoid any behavior changes when 
PODOFO_HAVE_OPENSSL is defined by rethrowing exception in methods that didn’t 
have try … catch previously.

Patches tested on Windows / Mac without OpenSSL support. Not tested on Linux.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



podofo-Arial.patch
Description: podofo-Arial.patch


podofo-NoOpenSSL.patch
Description: podofo-NoOpenSSL.patch


podofo-GetTempFilename.patch
Description: podofo-GetTempFilename.patch
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PoDoFo PdfString::Write buffer overflow

2018-04-19 Thread Mark Rogers
Hi

This code from PdfString::Write has a buffer overflow – it checks 
buffer.GetSize() > 0 then sets nInputBufferLen=GetSize()-2 which is passed to 
new[nInputBufferLen] and memcpy

if( pEncrypt && m_buffer.GetSize() && IsValid() )
{
pdf_long nInputBufferLen = m_buffer.GetSize() - 2; // Cut off the trailing pair 
of zeros
pdf_long nUnicodeMarkerOffet = sizeof( PdfString::s_pszUnicodeMarker );
if( m_bUnicode )
 nInputBufferLen += nUnicodeMarkerOffet;

 char * pInputBuffer = new char[nInputBufferLen];

 if( m_bUnicode )
{
 memcpy(pInputBuffer, PdfString::s_pszUnicodeMarker, nUnicodeMarkerOffet);
  memcpy(&pInputBuffer[nUnicodeMarkerOffet], m_buffer.GetBuffer(), 
nInputBufferLen - nUnicodeMarkerOffet);
}
else
 memcpy(pInputBuffer, m_buffer.GetBuffer(), nInputBufferLen);


}

If buffer.GetSize() == 1 and m_bUnicode is false then
  nInputBufferLen = -1;
   // bad_alloc or undefined behaviour when -1 sized array allocated
  char* pInputBuffer = new char[-1];
   memcpy( pInputBuffer, m_buffer.GetBuffer(), -1 );

If buffer.GetSize() == 1 and m_bUnicode is true then
  nInputBufferLen = 1;
  char* pInputBuffer = new char[1];
   // 2 bytes copied into 1 byte buffer
  memcpy( pInputBuffer, m_buffer.GetBuffer(), 2 );

If buffer.GetSize() == 2 and m_bUnicode is false then
  nInputBufferLen = 0;
  char* pInputBuffer = new char[0];
   // using pInputBuffer with size 0 is undefined behaviour 
https://stackoverflow.com/a/1087066
  memcpy( pInputBuffer, m_buffer.GetBuffer(), 0 );

If buffer.GetSize() == 2 and m_bUnicode is true then
  nInputBufferLen = 2;
  char* pInputBuffer = new char[2];
   memcpy( pInputBuffer, m_buffer.GetBuffer(), 2 );
   // first parameter is outside buffer bounds and C standard says it must 
still be a valid pointer for a zero byte copy
   // https://stackoverflow.com/a/3751937
   memcpy(&pInputBuffer[2], m_buffer.GetBuffer(), 2 - 2);

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PdfParser unit tests

2018-04-19 Thread Mark Rogers
Hi

Here are the unit tests for PoDoFo::PdfParser I’ve been working on. I’ve just 
included the .cpp and .h files rather than a patch since they’re new files.

I’ve not included a patch for CMakeLists.txt  – but I think all that’s needed 
is adding ParserTests.cpp to the CMakeLists.txt file for the unit tests 
(CppUnit takes care of everything else)

What’s tested:

  *   CVE-2017-8053, CVE-2015-8981, CVE-2017-5853, CVE-2018-5296 CVE-2017-8787, 
CVE-2018-5295 CVE-2017-8378
  *   Stress testing of ReadXRefSubsection( nFirstObject, nNumObjects ) with 
lots of different values for nFirstObject and nNumObjects
  *   Stress testing of ReadXRefSubsection with different values supplied to 
PdfParser::SetMaxObjectCount
  *   Testing other PdfParser functions for infinite recursion, out-of-memory 
handling etc
  *   See comments in ParserTests.h
  *   2k lines of code but still lots more that can be tested…

Test results

  *   Stack overflow in ReadXRefContents and ReadXRefStreamContents see 
https://sourceforge.net/p/podofo/tickets/7/
  *   If this is patched (I have a patch) then the tests run successfully on 
Windows 10 with VC++ 2015 and macOS 10.11 with XCode 8/Clang and 
AddressSanitizer enabled
  *   There’s a problem on macOS 10.13 (a SIGKILL when allocating a lot of 
memory) but it’s probably a macOS problem (10.13 is very buggy)

Not tested:

  *   Win64 build
  *   Linux with GCC - might need a small change to get low memory tests to 
work – see comment in canOutOfMemoryKillUnitTests() at end of ParserTests.cpp


Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



ParserTest.h
Description: ParserTest.h


ParserTest.cpp
Description: ParserTest.cpp
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PDF CVE Security Research

2018-04-18 Thread Mark Rogers
Hi

This will be of interest to anyone testing PoDoFo or reviewing submitted 
patches. It’s an analysis of 122 PDF CVEs found across a number of PDF products 
presented at the Blackhat Security conference in March 2017. Products with most 
CVEs found:

88 - Acrobat 88
15 - Foxit 15
8 – Adobe Digital Editions
5 - Chrome 5
3 - Apple Preview 3
3 - Windows PDF Library 3

https://www.blackhat.com/docs/asia-17/materials/asia-17-Liu-Dig-Into-The-Attack-Surface-Of-PDF-And-Gain-100-CVEs-In-1-Year.pdf

The slides have links to the PDF CVE test repositories maintained by Google and 
Mozilla (these are useful for testing PoDoFo)
https://pdfium.googlesource.com/pdfium_tests/
https://github.com/mozilla/pdf.js/tree/master/test/pdfs

And an analysis of the PDF modules most affected by CVEs:

34 – PDF Convertor
24 – JPEG 2000
24 – XFA
21 – Rendering
12 – Fonts
4 – Others
3 – JPEG (raw)

Does PoDoFo support JPEG 2000 or XFA?

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] [PATCH] PoFoFo: fix CVE-2018-5296 by reducing limit in s_nMaxObjects

2018-04-18 Thread Mark Rogers
Hi

VeraPDF enforces the 8,388,607 indirect object limit:
https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Part-1-rules#rule-6112-7

This answer on Adobe.com expands on the reason for the limit:
https://forums.adobe.com/thread/1041350 (Adobe Reader can’t load files with 
more than 8,388,607 indirect objects)

This patch just changes the default – a PoDoFo library user can accept the risk 
of uncontrolled memory allocation and restore the previous max limit by calling 
PoDoFo::PdfParser::SetMaxObjectCount( std::numeric_limits::max() )

With the 8,388,607 object count limit the maximum number of entries in 
m_offsets is 8,388,607 which uses this amount of memory for m_offsets:
32-bit: 8,388,607 * sizeof(PoDoFo::PdfParser::TXRefEntry) = 8,388,607 * 16 = 
134 MB
64-bit: 8,388,607 * sizeof(PoDoFo::PdfParser::TXRefEntry) = 8,388,607 * 24 = 
201 MB

With the current object count limit (2**31 on 32-bit systems and 2**63 on 
64-bit systems) then the maximum number of entries in m_offsets is less than 
this because m_offsets.max_size() = std::numeric_limits::max() / 
sizeof(PoDoFo::PdfParser::TXRefEntry) which is:

32-bit: std::numeric_limits::max() / 
sizeof(PoDoFo::PdfParser::TXRefEntry) = 2**32 / 16 = 268,435,456 (requires 
entire 32-bit address space)
64-bit: std::numeric_limits::max() / 
sizeof(PoDoFo::PdfParser::TXRefEntry) = 2 **64 / 24 = 7.6 x 10e17 (requires 
entire 64-bit address space)

Related: I don’t think PoDoFo checks the maximum number of objects when writing 
so it can produce PDFs that Adobe Reader can’t read.

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



Hello Mark, hello all,



> On 15 April 2018 at 22:06 Mark Rogers  wrote:

>

>

> Hi

>

>

> Here’s a simple patch for CVE-2018-5296 – it reduces the limit returned

> by GetMaxObjectCount from std::numeric_limits::max() to 8,388,607 which

> is the limit for for the maximum number of indirect objects specified

> in Table C.1 in Appendix C.2 Architectural Limits in PDF 32000-1:2008

>

the standard says there the limits are 32-bit systems whereas PoDoFo

uses 64-bit types in many places, therefore I'm feeling a bit uneasy

with the patch: Can anyone please shed some more light on this issue?



>

> Best Regards

>

>

> Mark

>

Best regards, mabri



From: Mark Rogers 
Date: Sunday, 15 April 2018 at 21:06
To: "podofo-users@lists.sourceforge.net" 
Subject: [Podofo-users] [PATCH] PoFoFo: fix CVE-2018-5296 by reducing limit in 
s_nMaxObjects

Hi

Here’s a simple patch for CVE-2018-5296 – it reduces the limit returned by 
GetMaxObjectCount from std::numeric_limits::max() to 8,388,607 which is 
the limit for for the maximum number of indirect objects specified in Table C.1 
in Appendix C.2 Architectural Limits in PDF 32000-1:2008

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] [PATCH] PoFoFo: fix CVE-2018-5296 by reducing limit in s_nMaxObjects

2018-04-15 Thread Mark Rogers
Hi

Here’s a simple patch for CVE-2018-5296 – it reduces the limit returned by 
GetMaxObjectCount from std::numeric_limits::max() to 8,388,607 which is 
the limit for for the maximum number of indirect objects specified in Table C.1 
in Appendix C.2 Architectural Limits in PDF 32000-1:2008

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



patch-CVE-2018-5296.diff
Description: patch-CVE-2018-5296.diff
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] CVE-2017-5855 and CVE-2017-6844

2018-04-15 Thread Mark Rogers
Hi

I’ve been trying to write unit tests for CVE-2017-5855 and CVE-2017-6844, and 
now think both are false positives due to a bug in Address Sanitizer triggered 
by large values passed to std::vector::resize()

The issues were both found with American Fuzzy Lop (AFL) and Address Sanitizer 
(ASAN). AFL sets the ASAN environment variable allocator_may_return_null=1 
https://github.com/mirrorer/afl/blob/master/docs/env_variables.txt#L248

When allocator_may_return_null=1 is set, the C++ new operator and 
std::allocator return NULL when they cannot allocate memory, and do not throw 
std::bad_alloc as the C++ specification requires. This breaks the Standard C++ 
Library and is logged as a bug in ASAN:
https://github.com/google/sanitizers/issues/748
https://github.com/google/sanitizers/issues/295
https://github.com/google/sanitizers/issues/295#issuecomment-234273218 (comment 
by a libstdc++ library developer describes the behaviour seen in the CVEs)

CVE-2017-6844
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-6844
https://blogs.gentoo.org/ago/2017/03/02/podofo-global-buffer-overflow-in-podofopdfparserreadxrefsubsection-pdfparser-cpp/
the stack trace shows the problem occurring in a call to 
std::vector::resize(count)

CVE-2017-5855
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-5855
https://blogs.gentoo.org/ago/2017/02/01/podofo-null-pointer-dereference-in-podofopdfparserreadxrefsubsection-pdfparser-cpp/
the stack trace shows the problem occurring in a call to 
std::vector::resize(count)

Without ASAN enabled std::vector::resize with a large count will throw a 
std::bad_alloc and be caught by the catch( std::exception ) statement in 
ReadXRefSubsection

Does this analysis make sense?

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] CVE-2017-5853 and CVE-2017-6844 testing (overflow fixed, but unhandled exception present)

2018-04-14 Thread Mark Rogers
It's actual unit tests (a new tests/unit/ParserTest.cpp file) and most of the 
tests are for PdfParser::ReadXRefSubsection (responsible for CVE-2015-8981, 
CVE-2017-5853, CVE-2017-5855, CVE-2017-6844, CVE-2018-5296 - 14% of the CVEs 
discovered in PoDoFo)

I'll submit the new tests next week - my main concern is adding a new .CPP and 
.H file to the build lists risks breaking the build very close to release. 

A safer option (until 0.9.6 is released) might be adding the new unit test 
files without changing the build - and anyone that's running tests can patch 
their build locally to include the new tests. 

Best Regards
Mark

-- 
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

On 13/04/2018, 21:58, "Mattia Rizzolo"  wrote:

On Fri, Apr 13, 2018 at 02:09:40PM +, Mark Rogers wrote:
> If I can also submit the parser unit tests now, but I was planning
> to wait until 0.9.6 release was complete

If you have actual unit tests (i.e., patches to tests/unit, or even
within tests/ only, and not external reproducers), I'd recommend
submitting them, and I would also recommend libpodofo maintainers to
accept them (as really, more tests can't possibly be a bad thing…).

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
more about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] CVE-2017-5853 and CVE-2017-6844 testing (overflow fixed, but unhandled exception present)

2018-04-13 Thread Mark Rogers
Hi

I’ve been writing unit tests to check the fixes for various parser related CVEs.

I’m happy to say the fix for CVE-2017-5853 and CVE-2017-6844 prevents overflows 
on 32-bit (Win32) and LP64 (macOS) architectures for a wide range of values. 
I’ve not tested LLP64 (Win64) which may behave differently because sizeof(long) 
!= sizeof(size_t).

I did find 2 problems:


  1.  m_offsets.resize() can throw std::length_error as well as std::bad_alloc. 
The C++ spec also says implementations are allowed to throw other exceptions in 
addition to these as long as they’re derived from the base class 
(std::exception). Currently ReadXRefSubSection throws a std::length_error 
instead of PdfError for large values of nFirstObject and nNumObjects – this 
cause an unhandled exception termination unless the caller is catching 
std::length_error. I think this needs fixed for 0.9.6 - the attached patch 
fixes that.

  2.  The PdfError thrown for out-of-range values is ePdfError_ValueOutOfRange 
for some values and ePdfError_InvalidXRef for other values (and the specific 
values change depending on whether the code is compiled for 32-bit or 64-bit). 
I don’t think this is serious enough to fix for 0.9.6 – but the fix would be 
making all the errors in ReadXRefSubSection all throw ePdfError_InvalidXRef or 
all throw ePdfError_InvalidXRef.

If I can also submit the parser unit tests now, but I was planning to wait 
until 0.9.6 release was complete

Cheers
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



patch-length-error.diff
Description: patch-length-error.diff
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] trunk does not report correct number of annotations

2017-06-07 Thread Mark Rogers
The patch for ReadXRefSubsection in r1840 has broken PDFs that contain XRef 
subsections.

- At r1839 m_offsets.resize was only called if m_offsets needed to grow
- At r1840 m_offsets.resize is always called which means it can shrink the 
m_offsets table and wipe out existing objects

For http://heeen.de/test2-annotated.pdf then r1839 did this

1) PdfParser::ReadDocumentStructure resizes m_offsets table to 126 objects, 
then reads into m_offsets
2) Doesn’t resize m_offsets in PdfParser::ReadXRefSubsection (already big 
enough)
3) Doesn’t resize m_offsets in PdfParser::ReadXRefSubsection (already big 
enough)

For http://heeen.de/test2-annotated.pdf then r1840 does this:

1) PdfParser::ReadDocumentStructure resizes m_offsets table to 126 objects, 
then reads into m_offsets
2) Resizes m_offsets to 1 in PdfParser::ReadXRefSubsection (this wipes out most 
of the objects)
3) Resizes m_offsets to 4 in PdfParser::ReadXRefSubsection

The example in H.7.1 on page 712 of PDF32000_2008.pdf shows the problem – the 
first xref subsection has nFirstObject=0, nNumObjects=1, so ReadXRefSubsection 
calls m_offsets.resize(1) which wipes out the previously read objects (4, 7, 8, 
9, 10 and 11)

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 
 



On 02/06/2017, 11:55, "zyx"  wrote:

On Fri, 2017-06-02 at 11:58 +0200, Florian Hänel wrote:
> WARNING: There are more objects (4) in this XRef table than specified
> in the size key of the trailer directory (1)!
> WARNING: There are more objects (126) in this XRef table than
> specified in the size key of the trailer directory (4)!

Hi,
the file you have seems to be broken. Could you test with a valid and
fully compliant PDF file, please?
Bye,
zyx

-- 
http://www.litePDF.cz i...@litepdf.cz


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] CVE fix proposal

2017-04-21 Thread Mark Rogers
The revised patch doesn’t compile because it uses:

+PODOFO_ERROR_INFO( ePdfError_ValueOutOfRange,
+"xref subsection's given entry numbers together too large" );

instead of

+PODOFO_RAISE_ERROR_INFO ( ePdfError_ValueOutOfRange,
+"xref subsection's given entry numbers together too large" );


On 09/04/2017, 22:17, "Matthew Brincke"  wrote:

Hi zyx, hi all,

I've replaced the asserts with PODOFO_RAISE_ERROR_INFO calls
in if checks and removed the size_t check (replacing it with
a check more to the point and C++-like), so I hope the change
is now ready for inclusion as attached to this e-mail (or maybe
with minor edits still crediting me, please).
I, like Mark, also haven't tested if it actually fixes
CVE-2017-5855, if it wouldn't, please still accept it for fixing
the other two.)

Best regards, mabri

zyx  has written on 9 April 2017 at 13:33:
> 
> On Sat, 2017-04-08 at 18:32 +0200, Matthew Brincke wrote:
> 
> > *   PODOFO_ASSERT( nFirstObject > 0 );
> 
> Hi,
> I do not like asserts, unless being used in unit tests or such places.
> Especially this place is used to parse random data from outside, which
> the library has no control of, then it's not a good idea to abort whole
> application due to the broken/unexpected input. I know PODOFO_ASSERT()
> is sensitive for debug builds, but anyway.
> 
> > *   PODOFO_ASSERT( sizeof(PdfParser::s_nMaxObjects) <= sizeof(size_t) );
> 
> sizeof() tells you how many bytes the argument holds. Is there a typo
> in this test?
> 
> I didn't run either of the proposed patches yet, though I agree with
> Matthew that if the checks can be done without ABI changes, then it'll
> be a better option.
> 
> Bye,
>  zyx
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Fix CVE-2017-7379: encoding array too short to encode/decode code point 0xffff

2017-04-19 Thread Mark Rogers
Previously the encoding table for PdfSimpleEncoding contained 0x entries. 
This was one entry too short to encode code point 0x

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



patch-CVE-2017-7379.diff
Description: patch-CVE-2017-7379.diff
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Fix: infinite loop in GetPageNumber() if Parent chain contains a loop

2017-04-19 Thread Mark Rogers
PdfPage::GetPageNumber goes into an infinite while loop if “Parent” chain 
contains a loop

This is caused by same underlying problem as CVE-2017-5852 (although it’s an 
infinite loop rather than infinite recursion)

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



patch -CVE-2017-5852-related.diff
Description: patch -CVE-2017-5852-related.diff
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Fix CVE-2017-7378: out-by-one buffer read scanning string

2017-04-19 Thread Mark Rogers
This fixes an out by one buffer read caused by string loop control using

for( i=0;i<=lStringLen;i++ )

instead of

for( i=0;imailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



patch-CVE-2917-7378.diff
Description: patch-CVE-2917-7378.diff
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] CVE fix proposal

2017-04-08 Thread Mark Rogers
Hi

There are the CVEs in ReadXRefSubsection:

CVE-2017-6844 global buffer overflow in PdfParser::ReadXRefSubsection
CVE-2017-5855 NULL pointer dereference in PdfParser::ReadXRefSubsection
CVE-2017-5853 signed integer overflow in PdfParser.cpp

CVE-2017-6844 and CVE-2017-5853 are caused by 3 related problems:

1) The addition in if ( nFirstObject + nNumObjects > m_nNumObjects ) overflows 
pdf_int64 if nFirstObject + nNumObjects > INT64_MAX. pdf_int64 is signed so the 
overflow wraps to a negative number, which means m_offsets.resize() is not 
always called when it should be. This leads to buffer overflow on 32-bit and 
64-bit systems.
2) The nFirstObject and nNumObjects parameters ReadXRefSubsection are 
pdf_int64, but nFirstObject + nNumObjects is stored in m_nNumObjects which is 
long, which means the value is truncated on 32-bit systems if nFirstObject + 
nNumObjects > LONG_MAX (=INT32_MAX). This leads to buffer overflow on 32-bit 
systems.
3) On 32-bit systems pdf_int64 can store larger values than size_t (which is 
32-bits). That means m_offsets.resize truncates nFirstObject + nNumObjects to 
size_t, which overflows on 32-bit systems if nFirstObject + nNumObjects > 
SIZE_MAX (=INT32_MAX or UINT32_MAX depending on compiler).

I don’t think mabri’s patch handles all of these cases because it still has the 
nFirstObject + nNumObjects addition without an overflow guard

I’ve attached a patch that should resolve the 32-bit and 64-bit overflows above 
(CVE-2017-6844 and CVE-2017-5853)

The patch may also resolve CVE-2017-5855, but I’ve not been able to confirm 
that yet.

Best Regards
Mark

-- 
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
 


On 08/04/2017, 01:16, "Matthew Brincke"  wrote:

Hi zyx, hi all,

I've addressed the below concerns, and run-tested as I could, pity
my GCC 5.2.1 didn't find a difference with/out patch with the option
-fsanitize=undefined, except for my diagnostic message ;-(. I used
the same test program as earlier. Please review the patch attached.

> zyx  has written on 28 February 2017 at 08:38:
> 
> On Tue, 2017-02-28 at 00:14 +0100, Matthew Brincke wrote:
> 
> > I haven't completed testing yet
> 
> Hi,
> thanks for the patch. Just from a quick read of the proposed change:
> 
> > *   const pdf_int64 maxNum
> > *   = static_cast(std::numeric_limits::max());
> 
> As far as I know, 'long' type is architectural dependant, 32 bits on
> 32bit arch and 64 bits on 64bit arch, thus it produces different
> values. Avoiding a 'long' usage might be a general benefit.
> 
> > *   "(%ld)!\n",
> > *   nFirstObject + nNumObjects, m_nNumObjects ); // 2nd arg is long!
> 
> The %ld is incorrect for the same reason. There are defines for proper
> formats, or cast the second argument to pdf_int64 instead and use the
> format specifier as before.
> 
> > *   ") in this XRef table than supported by this version of PoDoFo, "
> 
> This sounds odd to me, are you sure it's about what PoDoFo supports,
> not about what the standard supports? I mean, the standard suggests to
> stay in those limits even if the writer runs on a system which can
> cover more objects, to be compatible with 32-bit systems (because you
> never know on which system the reader runs).
>  Bye,
>  zyx



patch-CVE-2017-6844.diff
Description: patch-CVE-2017-6844.diff
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] another bunch of crashes

2017-04-07 Thread Mark Rogers
Hi

I’ve been doing some patching over the past couple of days and have patches for 
most of the CVEs.

I think the patch in r1835 fixes the case where pObj == pObj->GetParent() but I 
don’t think it fixes cases where pObj == pObj->GetParent()->GetParent() or 
pObj->GetParent() == pObj->GetParent()->GetParent(). There’s also the problem 
of an attacker deliberately creating a PDF with very deeply nested objects to 
cause a stack overflow.

This patch adds a recursion depth counter and throws an error if the recursion 
gets too deep. It’s probably worth combining the patches since the pObj == 
pObj->GetParent() case is probably the most common, but the depth check covers 
other types of loops in the “Parent” structure and protects against deeply 
nested PDFs

Best Regards
Mark

-- 
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
 



On 07/04/2017, 19:10, "zyx"  wrote:

On Thu, 2017-03-02 at 17:31 +0100, Agostino Sarubbo wrote:
>

Hi,
I tried on couple of CVE-s, using trunk at revision 1834. I chose to
behave in a non-forgiving way, but feel free to discuss those
"solutions" here, if you can think of anything better.

CVE-2017-5852 - fixed with revision 1835:
http://sourceforge.net/p/podofo/code/1835

CVE-2017-5854 - fixed with revision 1836:
http://sourceforge.net/p/podofo/code/1836

CVE-2017-5886 - fixed with revision 1837:
http://sourceforge.net/p/podofo/code/1837

Bye,
zyx

-- 
http://www.litePDF.cz i...@litepdf.cz


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users




patch-CVE-2017-5852.diff
Description: patch-CVE-2017-5852.diff
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] another bunch of crashes

2017-03-30 Thread Mark Rogers
Is there any way to use SourceForge tickets just for security bugs?

It looks like some CVEs have been fixed, some CVE patches rejected, but there’s 
no way from the mailing list to tell which CVEs have been fixed because most of 
the mailing list and commit messages don’t reference the CVEs.

At the moment it’s hard even to contribute patches because there’s no way to 
tell which CVEs are fixed, which are being worked on, and which are still 
outstanding.

If SourceForge tickets don’t work is there another alternative , for example, 
an empty GitHub repo with an issue tracker?

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 
 


On 19/03/2017, 18:51, "Mattia Rizzolo"  wrote:

On Mon, Mar 13, 2017 at 01:39:00PM +0100, Mattia Rizzolo wrote:
> On Thu, Mar 02, 2017 at 05:31:34PM +0100, Agostino Sarubbo wrote:
> > Please consider the following:
> > 
> > …
> 
> All of these now have CVEs associated.

And apparently the Debian release team is considering these severe
enough to warrant removing libpodofo from the next debian stable release
rather then leaving them unfixed ().
I severely lack time (and real proper knowledge) to start to help with
these, but I'd appreciate if you could prioritize them.

> I find the Debian view for security issues particularly nice to look at:
> https://security-tracker.debian.org/tracker/source-package/libpodofo

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
more about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] 0.9.5 regression, pdfImage.GetObject()->GetDictionary() throws exception

2017-02-17 Thread Mark Rogers
I think the best solution is reverting then using pdf_unt8 to store the enum 
internally.

The only changes needed after reverting are

change
   EPdfDataType m_eDataType;
to
   pdf_int8 m_eDataType;

and in PdfVariant::GetDataType change

   return m_eDataType;
to
   return (EPdfDataType)m_eDataType;


I’ll do some testing them submit a patch.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 
 


On 16/02/2017, 18:50, "zyx"  wrote:

On Sat, 2017-02-11 at 03:12 +0100, Sandro Mani wrote:
> So rather than just removing -std=c++98 from the CMakeLists, the
> code must be changed to ensure the size of the enum is always the
> same regardless of the language standard one is using.

Hi,
I would ideally use something like the attached change, but it has also
caveats. 

It lets me compile the test example with any standard, even with c++98,
while keeping the PoDoFo built with the default standard for the
compiler, but as the enum is defined in a public header, then when I
use lower-than c++11 (more precisely, when my compilation doesn't have
required standard version, but PoDoFo had been built with new-enough
C++ standard), the I get a warning from gcc:

   src/base/PdfCompilerCompat.h:202:29: warning: scoped enums only
   available with -std=c++11 or -std=gnu++11
 #define PODOFO_ENUM_UINT8 : uint8_t

Thus it can make, theoretically, a trouble. I do not see any good
solution for this, maybe only revert this change (r1810) all together
and change PdfVariant to not hold
   EPdfDataType m_eDataType;
but
   pdf_int8 m_eDataType;
instead, which would require some other internal changes possibly. I
would not change the public API otherwise, the functions would still
use the enum.

Opinions?
Bye,
zyx

-- 
http://www.litePDF.cz i...@litepdf.cz

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo 0.9.5.-rc1 Read for Download

2017-01-22 Thread Mark Rogers
Hi

Further info on this – by default C++ 11 always calls terminate() if a 
destructor throws an exception

You can override this with noexcept(false) on the destructor, but the run time 
will still call terminate() if it’s already processing an exception (as in my 
code example)

Given the range of different compilers PoDoFo needs to support I don’t think 
you can ever safely throw an exception in a PoDoFo destructor

Best Regards
Mark

On 22/01/2017, 13:11, "Mark Rogers"  wrote:

Hi

If a destructor called during stack unwinding throws an exception, then C++ 
guarantees the process will be killed by calling terminate()
https://isocpp.org/wiki/faq/exceptions#dtors-shouldnt-throw

http://stackoverflow.com/questions/5798107/c-throwing-an-exception-from-a-destructor

That means any code like this terminates the process, instead of reporting 
an exception:
 
Try
{
PdfPainter painter;

   Painter.SetPage( pCanvas ));

// calls PODOFO_RAISE_ERROR because gray out of range, 
   // but could be any PoDoFo method that raises an exception
   PdfPainter:: SetStrokingGray( 1.1 ); 

   // never reaches here if an exception is thrown
Painter.FinishPage();   
}
catch ( PoDoFo::PdfError& error )
{
// ~PdfPainter is called before handling this, but FinishPaint hasn’t 
been called
// so ~PdfPainter throws an exception, which then terminates the 
application
// because an exception is already being handled
   error.PrintErrorMsg();
}

If an exception isn’t being handled when  ~ PdfPainter throws an exception 
then: 

- the behavior is undocumented
- throwing in the destructor when destroying an array or collection of 
PdfPainters means some destructors are never called

Best Regards
Mark


    Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 
 



On 22/01/2017, 11:02, "zyx"  wrote:

On 20.1.2017 20:49, Mark Rogers wrote:
> There‘s a problem in release builds. PdfPainter::~PdfPainter throws
> exceptions in release builds because PODOFO_ASSERT throws exceptions
> via PODOFO_RAISE_ERROR_INFO in release builds

Hi,
that's after one of my changes.

> if( m_pCanvas ) PdfError::LogMessage( eLogSeverity_Error,
> "PdfPainter::~PdfPainter(): FinishPage() has to be called after a
> page is completed!" );

Are you sure you call the FinishPage() before the PdfPainter is 
destroyed? The error suggests that it's not done, but it should be. 
Alternatively, if you face this in any of the PoDoFo examples or tools, 
then they are supposed to be fixed.

In other words, I believe that this error is valid and should not be 
muted for non-debug builds.
Bye,
zyx

-- 
http://www.litePDF.czi...@litepdf.cz


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo 0.9.5.-rc1 Read for Download

2017-01-22 Thread Mark Rogers
Hi

If a destructor called during stack unwinding throws an exception, then C++ 
guarantees the process will be killed by calling terminate()
https://isocpp.org/wiki/faq/exceptions#dtors-shouldnt-throw
http://stackoverflow.com/questions/5798107/c-throwing-an-exception-from-a-destructor

That means any code like this terminates the process, instead of reporting an 
exception:
 
Try
{
PdfPainter painter;

   Painter.SetPage( pCanvas ));

// calls PODOFO_RAISE_ERROR because gray out of range, 
   // but could be any PoDoFo method that raises an exception
   PdfPainter:: SetStrokingGray( 1.1 ); 

   // never reaches here if an exception is thrown
Painter.FinishPage();   
}
catch ( PoDoFo::PdfError& error )
{
// ~PdfPainter is called before handling this, but FinishPaint hasn’t 
been called
// so ~PdfPainter throws an exception, which then terminates the 
application
// because an exception is already being handled
   error.PrintErrorMsg();
}

If an exception isn’t being handled when  ~ PdfPainter throws an exception 
then: 

- the behavior is undocumented
- throwing in the destructor when destroying an array or collection of 
PdfPainters means some destructors are never called

Best Regards
Mark


Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 
 



On 22/01/2017, 11:02, "zyx"  wrote:

On 20.1.2017 20:49, Mark Rogers wrote:
> There‘s a problem in release builds. PdfPainter::~PdfPainter throws
> exceptions in release builds because PODOFO_ASSERT throws exceptions
> via PODOFO_RAISE_ERROR_INFO in release builds

Hi,
that's after one of my changes.

> if( m_pCanvas ) PdfError::LogMessage( eLogSeverity_Error,
> "PdfPainter::~PdfPainter(): FinishPage() has to be called after a
> page is completed!" );

Are you sure you call the FinishPage() before the PdfPainter is 
destroyed? The error suggests that it's not done, but it should be. 
Alternatively, if you face this in any of the PoDoFo examples or tools, 
then they are supposed to be fixed.

In other words, I believe that this error is valid and should not be 
muted for non-debug builds.
Bye,
zyx

-- 
http://www.litePDF.czi...@litepdf.cz


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo 0.9.5.-rc1 Read for Download

2017-01-20 Thread Mark Rogers
Hi

There‘s a problem in release builds. PdfPainter::~PdfPainter throws exceptions 
in release builds because PODOFO_ASSERT throws exceptions via 
PODOFO_RAISE_ERROR_INFO in release builds

PdfPainter::~PdfPainter()
{
// Throwing exceptions in C++ destructors is not allowed.
// Just log the error.
// PODOFO_RAISE_LOGIC_IF( m_pCanvas, "FinishPage() has to be called after a 
page is completed!" );
// Note that we can't do this for the user, since FinishPage() might
// throw and we can't safely have that in a dtor. That also means
// we can't throw here, but must abort.
if( m_pCanvas )
PdfError::LogMessage( eLogSeverity_Error,
  "PdfPainter::~PdfPainter(): FinishPage() has to 
be called after a page is completed!" );

PODOFO_ASSERT( !m_pCanvas );
}

Simplest fix is deleting the assert or putting an #ifdef DEBUG round it:

#ifdef DEBUG
PODOFO_ASSERT( !m_pCanvas );
#endif

Best Regards
Mark


Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL






From: Dominik Seichter 
Date: Thursday, 19 January 2017 at 20:10
To: Mark Rogers 
Cc: zyx , "podofo-users@lists.sourceforge.net" 

Subject: Re: [Podofo-users] PoDoFo 0.9.5.-rc1 Read for Download

Hi Mark,

Thanks for the patch. As it is only a minor change for a test I applied it now.
Committed to revision 1816.

Best regards,
 Dominik

On Thu, Jan 19, 2017 at 6:40 PM, Mark Rogers 
mailto:mark.rog...@powermapper.com>> wrote:
Hi

I built and tested for 0.9.5-rc1 for Windows 32 bit and Mac 64 bit. No problems 
so far.

I did see one warning about a type conversion in one of the unit tests 
(pdf_long return value stored in an int). I’ve attached a patch to fix it, but 
this can wait till after the release.

Cheers
Mark


Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com<http://www.powermapper.com>
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



On 17/01/2017, 08:17, "zyx"  wrote:

On Mon, 2017-01-16 at 20:25 +0100, Dominik Seichter wrote:
> Please report any issues you have with this release candidate.

Hi,
there's one pending change, from Jaseem Ali, "Multiline Text Bug",
which should surely be included in the final 0.9.5, I only hope that
Jaseem will be able to provide the patch as soon as possible.
Bye,
zyx

--
http://www.litePDF.cz i...@litepdf.cz


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list

Podofo-users@lists.sourceforge.net<mailto:Podofo-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/podofo-users


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net<mailto:Podofo-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/podofo-users

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo 0.9.5.-rc1 Read for Download

2017-01-19 Thread Mark Rogers
Hi

I built and tested for 0.9.5-rc1 for Windows 32 bit and Mac 64 bit. No problems 
so far.

I did see one warning about a type conversion in one of the unit tests 
(pdf_long return value stored in an int). I’ve attached a patch to fix it, but 
this can wait till after the release.

Cheers
Mark


Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 



On 17/01/2017, 08:17, "zyx"  wrote:

On Mon, 2017-01-16 at 20:25 +0100, Dominik Seichter wrote:
> Please report any issues you have with this release candidate.

Hi,
there's one pending change, from Jaseem Ali, "Multiline Text Bug",
which should surely be included in the final 0.9.5, I only hope that
Jaseem will be able to provide the patch as soon as possible.
Bye,
zyx

-- 
http://www.litePDF.cz i...@litepdf.cz


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users




patch-podofo4.diff
Description: patch-podofo4.diff
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo 0.9.5 Release Plan

2017-01-16 Thread Mark Rogers
Hi Zyx

Here’s a final patch for this release – it swaps the order of two data members 
in PdfReference which causes PdfObject to shrink from 64 bytes to 56 bytes on 
64-bit platforms (no size change on 32-bit platforms)

On some PDFs this saves 12.8% allocated memory, and on one particular document 
it reduces memory requirements from 3317 MB to 2898 MB (saving 419 MB) on 
64-bit platforms. This is in addition to memory saved by previous patches.

BTW There’s still a lot of wasted space inside PdfObject - about 28% of the 
space allocated for PdfObject is alignment padding on 32-bit platforms. 
Eliminating this will reduce memory requirements by 28%, but requires bigger 
changes than just re-ordering members. I’ll look at that for next release.

Cheers
Mark

-- 
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
 




On 11/01/2017, 20:34, "zyx"  wrote:

On Tue, 2017-01-10 at 14:23 +0000, Mark Rogers wrote:
> Here are the patches:

Hi,
both patches look fine from my point of view, thus I committed them as
revision 1809 and revision 1810:
http://sourceforge.net/p/podofo/code/1809
http://sourceforge.net/p/podofo/code/1810

Thanks and bye,
zyx
-- 
http://www.litePDF.cz i...@litepdf.cz


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users




PoDoFo-patch3.diff
Description: PoDoFo-patch3.diff
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PdfName memory usage

2017-01-14 Thread Mark Rogers
Hi Dennis, Zyx

I’d worry that locking introduces performance issues. That one of the reasons 
std::string reference counting was dropped in favour of small string 
optimization in the C++ standard library (the locks caused a lot of contention 
in unexpected places). The reference counting in std::string is also the reason 
Visual C++ 6.0 C++ apps weren’t thread safe. There’s more detail here:
http://info.prelert.com/blog/cpp-stdstring-implementations

I think there’s a simpler solution that avoids locking. The design is based on 
the HTML5 parser used in Firefox (a long-lived, multi-threaded application). It 
has a bunch of pre-defined atoms for HTML element and attribute names like 
“html” “body” “div” “href” etc. These corresponds to PdfNames which are defined 
in the PDF spec (“kids”, “length” etc)

Does this work:

1) Use a pre-defined name table containing the names of all the PdfNames 
defined in the PDF spec (or just the common ones : “Kids”, “P” etc). Once 
initialised the name table is read-only so no locking is required to search it.
2) Change TKeyMap from std::map to  
std::map
3) When adding an entry to TKeyMap find if it already exists in pre-defined 
name table. If it doesn’t dynamically allocate it via new PdfName()
4) There’s no thread contention because names are either global and allocated 
at init time, or private and owned by the dictionary (as in PoDoFo currently).
5) Logic for PdfDictionary then becomes something like this

static std::map< std::string, PdfName*> PdfName::s_predefinedNames;

static PdfName() 
{
s_predefinedNames[ "Kids" ] = new PdfName(“Kids”);
s_predefinedNames[ "P" ] = new PdfName(“P”);
// etc
};

void PdfDictionary::AddKey( const PdfName & identifier, const PdfObject & 
rObject )
{
// snipped parameter checks for brevity

// check if this is a pre-defined name (should account for most/all names 
used)
PdfName* pPdfName = s_predefinedNames[ identifier.m_Data ];

if ( pPdfName == NULL )
{
// dynamically allocate name for unknown key which may be an extension
// or from a newer version of PDF than supported by library
pPdfName = new PdfName( identifier );

// delete dyanmically allocated PdfNames - see Clear() below
pPdfName->SetAutoDelete(true);
}

if( m_mapKeys.find( pPdfName ) != m_mapKeys.end() )
{
delete m_mapKeys[pPdfName];
m_mapKeys.erase( pPdfName );
}

m_mapKeys[pPdfName] = new PdfObject( rObject );
m_bDirty = true;
}

void PdfDictionary::Clear()
{
AssertMutable();

if( !m_mapKeys.empty() )
{
TIKeyMap it;

it = m_mapKeys.begin();
while( it != m_mapKeys.end() )
{
// call delete on pPdfNames allocated by AddKey
// but don't delete PdfNames in s_predefinedNames
if (*it).first->GetAutoDelete() )
delete (*it).first;

delete (*it).second;
++it;
}

m_mapKeys.clear();
    }
}

Thoughts?

Best Regards
Mark

-- 
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
 


On 14/01/2017, 22:38, "zyx"  wrote:

On Sat, 2017-01-14 at 10:19 -0800, Dennis Jenkins wrote:
> If you want a global cache of the PdfNames, I request that you add
> reference counting and protect access with a mutex (or similarly
> appropriate, cross-platform, synchronization barrier).

Hi,
that's precisely what I had on my mind, involving PdfMutex, thus fully
depending on the build options (-DPODOFO_MULTI_THREAD). I can write a
functional code to express in detail, if you'd like to.
Bye,
zyx

-- 
http://www.litePDF.cz i...@litepdf.cz


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PdfName memory usage

2017-01-13 Thread Mark Rogers
Hi 

I think it would probably need to be a static rather than per-document, because 
you won’t always have a reference to an owner document when dictionaries are 
being modified.

class PdfName
{
static Map< std::string, PdfName> s_nameTable;

}

That would mean keys are always valid (the name table would only be destroyed 
when the app exits)

One issue is threading – I’m not sure what guarantees PoDoFo gives around 
threading, but this makes dictionary operations non-thread safe unless locking 
is added.

Current memory layout in 32-bit windows is:

class PoDoFo::PdfName size(36):
0   (base class PoDoFo::PdfDataType)
0   {vfptr} (size=4)
4   m_bImmutable (size=1)
|(size=3)
8   std::string m_Data (size =24)
plus
8 bytes HeapAlloc overhead
8 bytes malloc overhead

Total: 36+8+8 =52 bytes per dictionary key

If the std::string was changed to a ref counted pointer this would change to 
something like:

class PoDoFo::PdfName size(16):
0   (base class PoDoFo::PdfDataType)
0   {vfptr} (size=4)
4   m_bImmutable (size=1)
|(size=3)
8   std::string* m_Data (size =4)
plus
8 bytes HeapAlloc overhead
8 bytes malloc overhead

Total: 31+8+8 =32 bytes per dictionary key

If the dictionary keys are changed to PdfName& or PdfName* then it changes to

Total: 4 bytes per dictionary key (Sizeof(PdfName*) = sizeof(void*) = 4 bytes) 
with no heap overhead

Best Regards
Mark


Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 
 

 


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PdfName memory usage

2017-01-11 Thread Mark Rogers
Hi Dom, Zyx

I’ve been looking at PoDoFo memory usage on large documents.

The PDF spec is 8.7 MB on disk, but uses around 200 MB of RAM when loaded into 
a PdfMemDocument
http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf

Memory usage is:

850,000 PdfNames  using about 70 MB, which are mostly PdfDictionary keys
125,000 PdfObjects using about 10 MB

A lot of the PdfNames are duplicated dictionary keys appearing in most/all 
objects (e.g. “Kids”, “Length”, “Parent” etc)

Eliminating the duplication should save a lot of memory:


-  Create a single document name table, something like
std::map< std::string , PdfName  > m_nameTable;


-  Change TKeyMap from
typedef std::map  TKeyMap; // stores PdfName in every 
object key : 36 bytes for sizeof(PdfName) + 24 bytes HeapAlloc overhead + 
PdfName::m_Data.length()
to
typedef std::map  TKeyMap; // stores reference (4 or 8 
byte pointer) in every object key



-  When keys are added to a PdfDictionary, add them to the document 
name table if they don’t exist, then add the PdfName& reference to TKeyMap 
(referencing a document name table entry)
This should reduce memory usage for PdfName from 70 MB to about 4MB in 
PDF32000_2008.pdf

Is this worth doing? Can you think of any problems this might cause?

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo 0.9.5 Release Plan

2017-01-10 Thread Mark Rogers
Hi

Here are the patches:

PoDoFo-mem-1.diff

Moves some member variables within PdfObject – saves space by reducing compiler 
alignment padding

PoDoFo-mem-2.diff

Changes base type of EPdfDataType from int to uint8_t (on compilers that 
support C++ 11) – the smaller type saves space and reduces compiler alignment 
padding

Win 32-bit before

sizeof(PdfObject) = 64
sizeof(PdfVariant) = 32
sizeof(PdfReference) = 16
sizeof(PdfDataType) = 8

Win 32-bit after

sizeof(PdfObject) = 56
sizeof(PdfVariant) = 24
sizeof(PdfReference) = 16
sizeof(PdfDataType) = 8

moving PdfObject members saved 0 bytes in PdfObject
changing EPdfDataType to uint8_t saved 8 bytes in PdfVariant

PdfObject uses 12% less memory on 32-bit
PdfVariant uses 33% less memory on 32-bit

Mac 64-bit before

sizeof(PdfObject) = 80
sizeof(PdfVariant) = 32
sizeof(PdfReference) = 24
sizeof(PdfDataType) = 16

Mac 64-bit after

sizeof(PdfObject) = 64
sizeof(PdfVariant) = 24
sizeof(PdfReference) = 24
sizeof(PdfDataType) = 16

moving PdfObject members saved 8 bytes in PdfObject
changing EPdfDataType to uint8_t saved 8 bytes in PdfVariant

PdfObject uses 20% less memory on 64-bit
PdfVariant uses 33% less memory on 64-bit

I’ve tested compilation on XCode 8 / Clang and Visual C++ 2015 

Cheers
Mark


Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 
 










On 06/01/2017, 08:30, "zyx"  wrote:

On Thu, 2017-01-05 at 18:59 +0000, Mark Rogers wrote:
> I’m not certain if PoDoFo makes any guarantees on binary
> compatibility between releases (although C++ makes it hard to provide
> any guarantees on binary compatibility). These patches will break
> binary compatibility (by changing the memory layout of member
> variables) but won’t affect source compatibility.

Hi,
as Mattia said, there is not problem with binary compatibility. I
already changed API, thus the soname version bump is required anyway.
 
> Do you want me to submit the patches?

Yes, please. The sooner the better, the code freeze is approaching
quickly.
Thanks and bye,
zyx

-- 
http://www.litePDF.cz i...@litepdf.cz


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users




PoDoFo-mem-2.diff
Description: PoDoFo-mem-2.diff


PoDoFo-mem-1.diff
Description: PoDoFo-mem-1.diff
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] PoDoFo 0.9.5 Release Plan

2017-01-05 Thread Mark Rogers
Hi Dom / Zyx

I have some patches which reduce PoDoFo memory usage by 10% on some platforms 
(e.g. around 10 MB less memory used loading the ISO 32000 spec PDF)

They are very simple – they just involve re-ordering class member variables to 
eliminate or reduce padding added by the compiler (in PdfObject there’s a lot 
of wasted space added by the compiler).

I’m not certain if PoDoFo makes any guarantees on binary compatibility between 
releases (although C++ makes it hard to provide any guarantees on binary 
compatibility). These patches will break binary compatibility (by changing the 
memory layout of member variables) but won’t affect source compatibility.

Do you want me to submit the patches?

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL




From: Dominik Seichter 
Date: Friday, 2 December 2016 at 15:57
To: Mark Rogers 
Subject: [Podofo-users] PoDoFo 0.9.5 Release Plan

Hello PoDoFo users and developers!

Zyx and i were discussing the timeline for the next PoDoFo release 0.9.5. 
Especially the recent changes in the area of PDF signatures should be made 
available to a broader audience. Therefore we plan a new release soon!

We currently plan a codefreeze starting mid of january with a first release 
candidate on January 16th. The final release is planned for February 2nd.

We would greatly appreciate additional contributions to this release. Please 
provide your patches in time to be intregrated before the codefreeze!

Best regards,
Dominik
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patch for PdfString underflow

2016-05-28 Thread Mark Rogers
Hi

I think I’ve got to the root cause of the reported problems in PdfString.

PdfStrings can be in 2 states: valid or invalid:

/** The string is valid if no error in the constructor has occurred.
 *  If it is valid it is safe to call all the other member functions.
 *  \returns true if this is a valid initialized PdfString
 */
inline bool IsValid() const;

The default PdfString constructor deliberately constructs an invalid string - 
this is used for things like PdfString::StringNull (which is different from an 
empty string) and is returned by various methods like 
PdfInfo::GetStringFromInfoDict, PdfField::GetFieldName, 
PdfField::GetAlternateName. There are other PdfString constructors that also 
create an invalid string: PdfString( (char*)NULL ) for example.

When IsValid() returns false various undefined behaviours occur if an invalid 
PdfString is used: 

- GetLength / GetUnicodeLength / GetCharacterLength return -1 or -2
- ToUnicode faults accessing a NULL pointer
- PdfEncoding::ConvertToUnicode - tries to allocate (SIZE_MAX-1)/2 or 
(SIZE_MAX-2)/2 bytes and throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfSimpleEncoding::ConvertToUnicode - tries to allocate SIZE_MAX-1 bytes and 
throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfIdentityEncoding::ConvertToEncoding - tries to allocate SIZE_MAX-1 bytes 
and throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfDifferenceEncoding::ConvertToUnicode tries to allocate SIZE_MAX-1 or 
SIZE_MAX-2  bytes and throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfDifferenceEncoding::ConvertToEncoding tries to allocate SIZE_MAX-1 or 
SIZE_MAX-2 bytes and throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfString comparison and equality operators may fault or throw 
ePdfError_OutOfMemory if they try to convert the encoding of one of the operands

I think the problems happen because none of the PoDoFo code checks 
PdfString::IsValid, apart from PdfString::GetStringUtf8. I would guess the same 
is true of most PoDoFo client code.

The patch makes PdfString methods have document well-defined safe behaviour if 
IsValid() returns false:

- PdfString::GetLength / PdfString::GetUnicodeLength / 
PdfString::GetCharacterLength return 0 (this prevents allocations of SIZE_MAX-1 
or SIZE_MAX-2)
- PdfString::ToUnicode returns an invalid string if it’s called on an invalid 
string
- the < and > operators return false if LHS and/or RHS are invalid 
- the == operator return false if either LHS or RHS are invalid 
- the == operator return true if both LHS and RHS are invalid 

The patch is designed to only change behaviour when the current behaviour is 
bad (i.e. access faults or out of memory errors). Where the current behaviour 
is reasonable there are no changes other than documenting the behaviour.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



patch-pdfstring-20160528.diff
Description: patch-pdfstring-20160528.diff

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Patch for warning RC4011: identifier truncated to 'PODOFO_COMPILER_LACKS_LL_LITERA'

2016-05-09 Thread Mark Rogers
Sorry, you can ignore this patch (it was against the 0.9.3 sources we use)

I just pulled the latest source from SVN and the resource compiler warning no 
longer appears in trunk (so patch not needed)

PS I did notice one build problem that might affect some users. The include 
wrapper podofo/podofo/PdfExtension.h is missing from SVN (running 
create_forward_headers.sh should fix this)

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 


-Original Message-
From: Mark Rogers [mailto:mark.rog...@powermapper.com] 
Sent: 09 May 2016 13:56
To: zyx; podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] Patch for warning RC4011: identifier truncated to 
'PODOFO_COMPILER_LACKS_LL_LITERA'

>If I got it right, you are trying to fix a rather informative compiler 
>warning for a specific compiler by not defining some symbols,
>
Not exactly. The Windows Resource Compiler processes .RC resource files and not 
C/C++ files (but uses a C pre-processor). The code in 
podofo\src\doc\podofo-doc.rc looks like this

#define VER_PRODUCTVERSION  PODOFO_MAJOR,PODOFO_MINOR,PODOFO_REVISION,20
#define VER_PRODUCTVERSION_STR  PODOFO_VERSION_STRING

#define VER_FILEVERSION VER_PRODUCTVERSION
#define VER_FILEVERSION_STR VER_PRODUCTVERSION_STR

VS_VERSION_INFO VERSIONINFO
FILEVERSION VER_FILEVERSION
PRODUCTVERSION  VER_PRODUCTVERSION
FILEFLAGSMASK   VS_FFI_FILEFLAGSMASK
FILEFLAGS   (VS_FF_PRIVATEBUILD|VER_PRERELEASE|VER_DEBUG)
FILEOS  VOS__WINDOWS32
FILETYPEVER_FILETYPE
FILESUBTYPE VFT2_UNKNOWN
BEGIN
BLOCK "StringFileInfo"
BEGIN
BLOCK "040904E4"
BEGIN
VALUE "FileVersion", VER_FILEVERSION
VALUE "ProductVersion", VER_PRODUCTVERSION
VALUE "Comments", "PoDoFo Doc PDF Library\0"
VALUE "CompanyName", "PoDoFo\0"
VALUE "InternalName", "podofo\0"
VALUE "ProductName", "PoDoFo\0"
VALUE "LegalCopyright", "Copyright (C) 2010 Dominik Seichter, Craig 
Ringer, The PoDoFo Developers\0"
VALUE "FileDescription", VER_FILEDESCRIPTION_STR
VALUE "OriginalFilename", VER_ORIGINALFILENAME_STR
VALUE "PrivateBuild", VER_PRIVATEBUILD_STR
END
END
END

The symbol in question is PODOFO_COMPILER_LACKS_LL_LITERAL which is only used 
by the C++ code to decide the type of C++ literals the C++ compiler supports. 
The Resource Compiler doesn't support C++ LL literals (or C++ code) and 
podofo-doc.rc doesn't use the PODOFO_COMPILER_LACKS_LL_LITERAL macro, but warns 
that the macro is too long (so the warning isn't useful)

The only macros podofo-doc.rc uses are the ones that define the PoDoFo version, 
which get included in the version resource above.

The reason I chose __FILE__ was

__FILE__ was documented in "The C Programming Language" by Kernighan and Richie 
in 1988 so it's supported by very old compilers __FILE__ is guaranteed to be 
defined in C/C++ code by the C/C++ standard (first standardised in C89) 
__FILE__ is documented as undefined in the Windows Resource Compiler 
documentation PoDoFo C++ code won't compile if __FILE__ is undefined since it's 
used by PODOFO_RAISE_ERROR (and several other places)

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com PowerMapper Software Ltd - 
www.powermapper.com Registered in Scotland No 362274 Quartermile 2 Edinburgh 
EH3 9GL 

-Original Message-
From: zyx [mailto:z...@litepdf.cz]
Sent: 08 May 2016 17:59
To: podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] Patch for warning RC4011: identifier truncated to 
'PODOFO_COMPILER_LACKS_LL_LITERA'

On Thu, 2016-05-05 at 10:56 -0500, Mark Rogers wrote:
> The following patch removes the long standing Resource Compiler 
> warning in Windows builds:

Hi,
thanks for the code snippet (not a patch, it looks slightly differently). :)
 
> PdfCompilerCompat.h(110) : warning RC4011: identifier truncated to 
> 'PODOFO_COMPILER_LACKS_LL_LITERA'

If I got it right, you are trying to fix a rather informative compiler warning 
for a specific compiler by not defining some symbols, which are used in the 
code in a various places, by checking some probably defined symbol in all 
(most) other compilers which can be used to compile the sources.

I do not think it's a way to go.

You should check for that particular compiler version, if you want to avoid the 
warning for it (see the top of the PdfCompilerCompat.h).

You shouldn't do that by not defining something which is used in the code (I 
know, rc.exe is used for t

Re: [Podofo-users] Patch for warning RC4011: identifier truncated to 'PODOFO_COMPILER_LACKS_LL_LITERA'

2016-05-09 Thread Mark Rogers
>If I got it right, you are trying to fix a rather informative compiler warning 
>for a specific compiler by not defining some symbols, 
>
Not exactly. The Windows Resource Compiler processes .RC resource files and not 
C/C++ files (but uses a C pre-processor). The code in 
podofo\src\doc\podofo-doc.rc looks like this

#define VER_PRODUCTVERSION  PODOFO_MAJOR,PODOFO_MINOR,PODOFO_REVISION,20
#define VER_PRODUCTVERSION_STR  PODOFO_VERSION_STRING

#define VER_FILEVERSION VER_PRODUCTVERSION
#define VER_FILEVERSION_STR VER_PRODUCTVERSION_STR

VS_VERSION_INFO VERSIONINFO
FILEVERSION VER_FILEVERSION
PRODUCTVERSION  VER_PRODUCTVERSION
FILEFLAGSMASK   VS_FFI_FILEFLAGSMASK
FILEFLAGS   (VS_FF_PRIVATEBUILD|VER_PRERELEASE|VER_DEBUG)
FILEOS  VOS__WINDOWS32
FILETYPEVER_FILETYPE
FILESUBTYPE VFT2_UNKNOWN
BEGIN
BLOCK "StringFileInfo"
BEGIN
BLOCK "040904E4"
BEGIN
VALUE "FileVersion", VER_FILEVERSION
VALUE "ProductVersion", VER_PRODUCTVERSION
VALUE "Comments", "PoDoFo Doc PDF Library\0"
VALUE "CompanyName", "PoDoFo\0"
VALUE "InternalName", "podofo\0"
VALUE "ProductName", "PoDoFo\0"
VALUE "LegalCopyright", "Copyright (C) 2010 Dominik Seichter, Craig 
Ringer, The PoDoFo Developers\0"
VALUE "FileDescription", VER_FILEDESCRIPTION_STR
VALUE "OriginalFilename", VER_ORIGINALFILENAME_STR
VALUE "PrivateBuild", VER_PRIVATEBUILD_STR
END
END
END

The symbol in question is PODOFO_COMPILER_LACKS_LL_LITERAL which is only used 
by the C++ code to decide the type of C++ literals the C++ compiler supports. 
The Resource Compiler doesn't support C++ LL literals (or C++ code) and 
podofo-doc.rc doesn't use the PODOFO_COMPILER_LACKS_LL_LITERAL macro, but warns 
that the macro is too long (so the warning isn't useful)

The only macros podofo-doc.rc uses are the ones that define the PoDoFo version, 
which get included in the version resource above.

The reason I chose __FILE__ was

__FILE__ was documented in "The C Programming Language" by Kernighan and Richie 
in 1988 so it's supported by very old compilers
__FILE__ is guaranteed to be defined in C/C++ code by the C/C++ standard (first 
standardised in C89)
__FILE__ is documented as undefined in the Windows Resource Compiler 
documentation
PoDoFo C++ code won't compile if __FILE__ is undefined since it's used by 
PODOFO_RAISE_ERROR (and several other places)

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 

-Original Message-
From: zyx [mailto:z...@litepdf.cz] 
Sent: 08 May 2016 17:59
To: podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] Patch for warning RC4011: identifier truncated to 
'PODOFO_COMPILER_LACKS_LL_LITERA'

On Thu, 2016-05-05 at 10:56 -0500, Mark Rogers wrote:
> The following patch removes the long standing Resource Compiler 
> warning in Windows builds:

Hi,
thanks for the code snippet (not a patch, it looks slightly differently). :)
 
> PdfCompilerCompat.h(110) : warning RC4011: identifier truncated to 
> 'PODOFO_COMPILER_LACKS_LL_LITERA'

If I got it right, you are trying to fix a rather informative compiler warning 
for a specific compiler by not defining some symbols, which are used in the 
code in a various places, by checking some probably defined symbol in all 
(most) other compilers which can be used to compile the sources.

I do not think it's a way to go.

You should check for that particular compiler version, if you want to avoid the 
warning for it (see the top of the PdfCompilerCompat.h).

You shouldn't do that by not defining something which is used in the code (I 
know, rc.exe is used for the resource compiling only, but anyway). When the 
symbol (__FILE__) is not defined in the code-compiler too, then the code might 
not build.

There is a PODOFO_COMPILE_RC, which seems suitable for the issue you are trying 
to address.

Bye,
zyx

-- 
http://www.litePDF.cz i...@litepdf.cz

--
Find and fix application performance issues faster with Applications Manager 
Applications Manager provides deep performance insights into multiple tiers of 
your business applications. It resolves application problems quickly and 
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Podofo-users mailing list
Podofo-users@lists.sou

[Podofo-users] Underflows in PdfString::GetLength, PdfString::GetUnicodeLength, PdfString::GetCharacterLength

2016-05-07 Thread Mark Rogers
Hi

There are underflow conditions in PdfString::GetLength() and 
PdfString::GetUnicodeLength - these return lengths of -2 and -1 if 
m_buffer.GetSize is zero. That has a knock on effect of mallocing (size_t)-1 or 
(size_t)-2 in various ConvertToEncoding methods (4GB-2 when size_t is 32-bit) 
which will usually fail and throw an ePdfError_OutOfMemory error.

Is the behaviour in PdfString intentional or should it be patched as below?


pdf_long PdfString::GetLength() const
{
// patch? if ( m_buffer.GetSize() == 0 ) return 0;

return m_buffer.GetSize() - 2;
}

pdf_long PdfString::GetCharacterLength() const
{
return this->IsUnicode() ? this->GetUnicodeLength() : this->GetLength();
}

pdf_long PdfString::GetUnicodeLength() const
{
// patch? if ( m_buffer.GetSize() == 0 ) return 0;

return (m_buffer.GetSize() / sizeof(pdf_utf16be)) - 1;
}


Best Regards
Mark


Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com<http://www.powermapper.com>
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patch for warning RC4011: identifier truncated to 'PODOFO_COMPILER_LACKS_LL_LITERA'

2016-05-05 Thread Mark Rogers
Hi

The following patch removes the long standing Resource Compiler warning in 
Windows builds:

PdfCompilerCompat.h(110) : warning RC4011: identifier truncated to 
'PODOFO_COMPILER_LACKS_LL_LITERA'


/*
 * Some elderly compilers, notably VC6, don't support LL literals.
* In those cases we can use the oversized literal without any suffix.
* The __FILE__ test stops a truncated macro warning in the Windows resource 
compiler RC.exe.
*/
#if defined(__FILE__)
#if defined(PODOFO_COMPILER_LACKS_LL_LITERALS)
#  define PODOFO_LL_LITERAL(x) x
#  define PODOFO_ULL_LITERAL(x) x
#else
#  define PODOFO_LL_LITERAL(x) x##LL
#  define PODOFO_ULL_LITERAL(x) x##ULL
#endif
#endif

RC.exe doesn't define __FILE__ when compiling - all C/C++ compilers do (and 
PoDoFo already has dependencies on __FILE__ in PODOFO_RAISE_ERROR and 
AddToCallstack calls)

Cheers
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patches

2015-02-27 Thread Mark Rogers
Hi

Here are a couple of patches:

Patch1
Fixes a problem with a previous patch I sent, that occurs when a PDF contains a 
mix of xref tables (ISO 32000-1 7.5.4)  and XRefStm streams (ISO 32000-1 
7.5.8.1) in the Prev chain. This happens if a PDF has been through multiple 
tools, for example the W-9 PDF on  irs.gov. PoDoFo currently assumes that all 
xrefs sections following an XRefStm are all XRefStms as well (which is not 
always true).
Has been used in production since July 2014

Patch 2
Fixes a NULL pointer de-ref on some badly formed PDFs
Has been used in production since Sept 2014

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



patch2.diff
Description: patch2.diff


patch1.diff
Description: patch1.diff
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] SVN commit 1587 broke ability to parse several PDFs

2014-07-02 Thread Mark Rogers
Hi

I finally had a chance to look at this – looks like there’s a long-standing bug 
in PdfParser::ReadXRefStreamContents

Once called, the method assumes that all cross reference information found by 
following the “Prev” keys is stored as cross ref streams (XRefStm). The IRS 
test documents uses a mix of old style cross-ref tables (xref) and cross ref 
streams (XRefStm) in the Prev chain. I’m guessing they’ve been through a couple 
of different PDF editors.

PdfTokenizer::GetNextNumber() is throwing an error because the next token is 
“xref” instead of number when it reads an xref table it assumes is an XRefStm

Given that fixing this might uncover more problems, and it’s very close to 
release day, I’d suggest keeping r1648 for the moment and I’ll submit a patch 
after the release.

Does that sound ok?

Cheers
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

From: Dennis Jenkins [mailto:dennis.jenkins...@gmail.com]
Sent: 30 June 2014 21:31
To: zyx
Cc: podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] SVN commit 1587 broke ability to parse several PDFs


On Mon, Jun 30, 2014 at 3:10 PM, zyx mailto:z...@litepdf.cz>> 
wrote:

Hi,
thanks for a quick testing. I committed the patch as r1648 [1]. If
you'll find time and give it more thorough testing by Friday, then
it'll be great (you know, just in case it has any side-effects).
Thanks again and bye,
zyx

[1] http://sourceforge.net/p/podofo/code/1648

Hello,
   r1648 works fine for me, for both my quick parser test and for my full suite 
of unit tests for my own project.  Thank you!
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patches for hangs and access violations

2014-04-01 Thread Mark Rogers
Hi

Here are three patches which fix a number of problems seen with PDF documents 
in the wild:

PdfPagesTree-1.patch
Fixes an access violation in PdfPagesTree::GetPageNode when the "Kids" array is 
missing

PdfPagesTree-2.patch
fixes an infinite loop in PdfPagesTree::GetPageNodeFromArray when the "Kids" 
array is missing

PdfPages.patch
Fixes an access violation in PdfPage::GetPageNumber when the "Kids" array is 
missing

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



PdfPagesTree-2.patch
Description: PdfPagesTree-2.patch


PdfPage.patch
Description: PdfPage.patch


PdfPagesTree-1.patch
Description: PdfPagesTree-1.patch
--
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patch to support PDF XRefStm

2014-04-01 Thread Mark Rogers
Hi

PoDoFo 0.9.2 and earlier don't support cross reference streams (XRefStm) which 
means it can't correctly read some PDFs. For example, 0.9.2 can't resolve the 
StructTreeRoot reference in this document:
http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf

The attached patch adds support for XRefStm to PdfParser::ReadNextTrailer() by 
adding code between

MergeTrailer( &trailer );
and
if( trailer.GetDictionary().HasKey( "Prev" ) )

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL



PdfParser.cpp.patch
Description: PdfParser.cpp.patch
--
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PdfMemDocument sometimes doesn't load object streams (ObjStm)

2013-07-10 Thread Mark Rogers
Hi

This looks like a bug. PoDoFo 0.9.2 PdfMemDocument sometimes doesn't load 
objects contained inside object streams (/Type ObjStm)

I've seen this in several files, but this is a simple example:
http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf

1) If you view this PDF file using PoDoFoBrowser, it only shows about 113 
objects loaded (there's a gap in object numbers between obj 73 and obj 710)
2) If you uncompress the object streams in PDFOpenParameters.pdf using pdftk, 
then view the file in PoDoFoBrowser it shows 740 objects loaded (with no gap in 
the object numbers). 

Objects contained in the object streams are displayed by several other PDF 
tools (including Adobe Reader) so I'm inclined to think the problem is in 
PoDoFo.

Here's what I've found so far:

1) PdfParser::ReadObjectsInternal() only calls 
PdfParser::ReadObjectFromStream() if m_offsets[i].cUsed == 's'
2) m_offsets[i].cUsed = 's' is set in 
PdfXRefStreamParserObject::ReadXRefStreamEntry
3) ReadXRefStreamEntry is called (indirectly) by 
PdfParser::ReadXRefStreamContents
4) ReadXRefStreamContents is never called when PDFOpenParameters.pdf is read, 
but I can see an XRefStm when displaying the PDF in a text editor 

I'm happy to try to produce a patch, but some pointers on where to start 
looking for the cause would be much appreciated.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patch for heap corruption

2013-02-22 Thread Mark Rogers
Hi

Here’s a patch for a heap corruption issue experienced when reading some 
damaged PDF documents:

/src/base/PdfXRefStreamParserObject.cpp

Line 137-142 was
//printf("nCount=%i ", static_cast(nCount));
//printf("pBuffer=%li ", (long)(pBuffer - pStart));
//printf("pEnd=%li ", lBufferLen);
if( ! (*m_pOffsets)[static_cast(nFirstObj)].bParsed )
ReadXRefStreamEntry( pBuffer, lBufferLen, nW, static_cast(nFirstObj) );

Additional line added to make sure nFirstObj is in range:

//printf("nCount=%i ", static_cast(nCount));
//printf("pBuffer=%li ", (long)(pBuffer - pStart));
//printf("pEnd=%li ", lBufferLen);
if ( nFirstObj >= 0 && nFirstObj < m_pOffsets->size() )
if( ! 
(*m_pOffsets)[static_cast(nFirstObj)].bParsed )
ReadXRefStreamEntry( pBuffer, lBufferLen, nW, static_cast(nFirstObj) );

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

From: cybevnm [mailto:cybe...@gmail.com]
Sent: 19 February 2013 18:45
To: Leonard Rosenthol; podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] CID fonts subsettting

There is a broblem - I'm not sure how to do font subsetting for unicode-aware 
TTF font using PoDoFo API.
Seems like there are no way to ask for TTF font with subsetting enabled using 
casual PdfDocument.CreateFont(). 
EFontCreationFlags::eFontCreationFlags_Type1Subsetting isn't about TTF; 
PdfDocument.CreateFontSubset just pulls every glyph to subset (i.e. makes usual 
embedding).
So, how can I force PoDoFo to create TTF font with subsetting enabled ? (or I 
just moving in wrong direction?)


2/19/2013 2:12 AM, Leonard Rosenthol пишет:
If you are planning to support a wide set of potential languages with a single 
font, then yes, subset embedding is the correct solution.

IIRC, PoDoFo supports subsetting for TrueType fonts only.  If that font is a 
TTF, you should be fine.

Leonard

From: cybevnm mailto:cybe...@gmail.com>>
Date: Monday, February 18, 2013 5:09 PM
To: Leonard Rosenthol mailto:lrose...@adobe.com>>
Cc: 
"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>" 
mailto:podofo-users@lists.sourceforge.net>>
Subject: Re: [Podofo-users] CID fonts subsettting

I want to generate a pdf document, which contains strings in several languages. 
Set of the languages (i.e. set of required characters) can be determined only 
at generation phase, so I've decided to use a font, which contains as much as 
possible characters (http://en.wikipedia.org/wiki/GNU_Unifont in my case), to 
represent any possible input language during the rendering of the document by a 
pdf viewer.
The GNU Unifont isn't common at clients' workstations (which is mostly under 
the MS Windows control), so I should to do some form of font embedding. But I 
have conflicting requirement - documents should be as small as possible, so 
full font embedding isn't an option (+ 3 Mbytes of overhead per document is too 
much for me).
I think that only possible option in my situation is to embed only those glyphs 
which actually used in document - this is what I mean under 'font subsetting' 
term.
Is it possible to make this using PoDoFo ? Or maybe there are another ways to 
satisfy the requirements ?


On 02/18/2013 11:22 PM, Leonard Rosenthol wrote:

You don't actually subset a CID font - since there is no such thing as a

CID font file.



A CID font is really a special way to encode/store font data in a PDF file

when having to deal with Unicode-based fonts.  You can create a CID font

from a TTF or OTF font as part of subsetting it - though it's not required.



So backing this upŠ



What are you trying to achieve with PoDoFo that is blocking you?



Leonard





On 2/18/13 2:42 PM, "cybevnm" <mailto:cybe...@gmail.com> 
wrote:



Hi!

After some searching through the podofo code I suspect that subsetting

for the CID fonts isn't implemented (PdfFontCID doesn't implement

AddUsedSubsettingGlyphs etc.) and can be done just for the PdfFontType1

fonts (which doesn't support unicode).  But there are mystic (for me)

PdfFontTTFSubset and PdfFontCache::GetFontSubset.

So, is CID fonts (unicode-aware) subsetting supported and what are

PdfFontTTFSubset and PdfFontCache::GetFontSubset  for ?

Thanks!





--



The Go Parallel Website, sponsored by Intel - in partnership with

Geeknet,

is your hub for all things parallel software development, from weekly

thought

leadership blogs to news, videos, case studies, tutorials, tech docs,

whitepapers, evaluation guides, and opinion stories. Check out the most

recent posts - join the conversation now.

http://goparallel.sourceforge.net/

__

Re: [Podofo-users] Patch for stack overflow

2012-07-16 Thread Mark Rogers
Great - checked SVN against my source - all the changes look good.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 


-Original Message-
From: Dominik Seichter [mailto:domseich...@googlemail.com] 
Sent: 15 July 2012 15:30
To: Mark Rogers
Cc: podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] Patch for stack overflow

Hi Mark,

Thank. So I think I got this right now. I've committed both of your changes. 
Maybe you want to check if everything is correct.

Regards,
 Dom

On Sun, Jul 15, 2012 at 1:16 PM, Mark Rogers  
wrote:
> Hi Dom
>
> Looks like line numbers for this one were out due to the previous patch:
>
> The lines of code starting
>  const int maxReadNextTrailerLevel = 500 go just after the opening 
> brace of PdfParser::ReadNextTrailer()
>
> and the following line
>  --m_nReadNextTrailerLevel;
> goes just before the closing brace of PdfParser::ReadNextTrailer()
>
> Worth noting that no attempt is made to decrement m_nReadNextTrailerLevel 
> when exceptions are thrown - I've assumed exeptions are all fatal and cause 
> the parser to abort. If this is a faulty assumption let me know and I can 
> look at this in more detail.
>
> I'm happy to check the patched files against my version if you email 
> me them, or let me know when they're committed to SVN
>
> Best Regards
> Mark
>
> -Original Message-
> From: Dominik Seichter [mailto:domseich...@googlemail.com]
> Sent: 15 July 2012 08:36
> To: Mark Rogers
> Cc: podofo-users@lists.sourceforge.net
> Subject: Re: [Podofo-users] Patch for stack overflow
>
> Hi Mark,
>
> I finally found some time to look at some PoDoFo patches. Thanks for the 
> patch. This sounds very useful!
> I have a little trouble to apply this, though.
>
> Where is this part supposed to go? At the end of which method. Line
> 540 in my version of file does not make much sense  I think it should be 
> at the end of ReadNextTrailer(), right?
>
>> .540 added
>> +   --m_nReadNextTrailerLevel;
>> }
>>
>
> Regards,
>  Dom
>
> On Wed, Jun 27, 2012 at 4:52 PM, Mark Rogers  
> wrote:
>> Found some more PDF documents in wild which cause problems - recursive stack 
>> overflow in this case due to circular cross references in the trailer. Worth 
>> saying that the library is generally very stable - but I'm pumping lots of 
>> PDFs from different sources through it so seeing some unusual edge cases.
>>
>> Here's a patch that limits the recursion depth when reading the 
>> trailer
>>
>> PdfParser.h
>> .577 added
>> +int   m_nReadNextTrailerLevel;
>>
>> PdfParser.cpp
>> void PdfParser::Init()
>> {
>> .127 added
>> +   m_nReadNextTrailerLevel = 0;
>> }
>>
>> PdfParser::ReadNextTrailer()
>> {
>> .493 added
>> +   // be careful changing this limit - overflow limits depend on the 
>> OS, linker settings, and how much stack space compiler allocates
>> +   // 500 limit prevents overflow on Win7 with VC++ 2005 with default 
>> linker stack size (1000 caused overflow with same compiler/OS)
>> +   const int maxReadNextTrailerLevel = 500;
>> +
>> +   ++m_nReadNextTrailerLevel;
>> +
>> +   if ( m_nReadNextTrailerLevel > maxReadNextTrailerLevel )
>> +   {
>> +   // avoid stack overflow on documents that have circular 
>> cross references in trailer
>> +PODOFO_RAISE_ERROR( ePdfError_InvalidXRef );
>> +   }
>>
>> .540 added
>> +   --m_nReadNextTrailerLevel;
>> }
>>
>> Best Regards
>> Mark
>>
>> Mark Rogers - mark.rog...@powermapper.com PowerMapper Software Ltd - 
>> www.powermapper.com Registered in Scotland No 362274 Quartermile 2 
>> Edinburgh EH3 9GL
>>
>>
>> -
>> -
>> 
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and 
>> threat landscape has changed and how IT managers can respond.
>> Discussions will include endpoint security, mobile security and the 
>> latest in malware threats.
>> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> ___
>> Podofo-users mailing list
>> Podofo-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/podofo-users

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Patch for stack overflow

2012-07-15 Thread Mark Rogers
Hi Dom

Looks like line numbers for this one were out due to the previous patch:

The lines of code starting 
 const int maxReadNextTrailerLevel = 500
go just after the opening brace of PdfParser::ReadNextTrailer()

and the following line
 --m_nReadNextTrailerLevel;
goes just before the closing brace of PdfParser::ReadNextTrailer()

Worth noting that no attempt is made to decrement m_nReadNextTrailerLevel when 
exceptions are thrown - I've assumed exeptions are all fatal and cause the 
parser to abort. If this is a faulty assumption let me know and I can look at 
this in more detail.

I'm happy to check the patched files against my version if you email me them, 
or let me know when they're committed to SVN

Best Regards
Mark

-Original Message-
From: Dominik Seichter [mailto:domseich...@googlemail.com] 
Sent: 15 July 2012 08:36
To: Mark Rogers
Cc: podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] Patch for stack overflow

Hi Mark,

I finally found some time to look at some PoDoFo patches. Thanks for the patch. 
This sounds very useful!
I have a little trouble to apply this, though.

Where is this part supposed to go? At the end of which method. Line
540 in my version of file does not make much sense  I think it should be at 
the end of ReadNextTrailer(), right?

> .540 added
> +   --m_nReadNextTrailerLevel;
> }
>

Regards,
 Dom

On Wed, Jun 27, 2012 at 4:52 PM, Mark Rogers  
wrote:
> Found some more PDF documents in wild which cause problems - recursive stack 
> overflow in this case due to circular cross references in the trailer. Worth 
> saying that the library is generally very stable - but I'm pumping lots of 
> PDFs from different sources through it so seeing some unusual edge cases.
>
> Here's a patch that limits the recursion depth when reading the 
> trailer
>
> PdfParser.h
> .577 added
> +int   m_nReadNextTrailerLevel;
>
> PdfParser.cpp
> void PdfParser::Init()
> {
> .127 added
> +   m_nReadNextTrailerLevel = 0;
> }
>
> PdfParser::ReadNextTrailer()
> {
> .493 added
> +   // be careful changing this limit - overflow limits depend on the OS, 
> linker settings, and how much stack space compiler allocates
> +   // 500 limit prevents overflow on Win7 with VC++ 2005 with default 
> linker stack size (1000 caused overflow with same compiler/OS)
> +   const int maxReadNextTrailerLevel = 500;
> +
> +   ++m_nReadNextTrailerLevel;
> +
> +   if ( m_nReadNextTrailerLevel > maxReadNextTrailerLevel )
> +   {
> +   // avoid stack overflow on documents that have circular cross 
> references in trailer
> +PODOFO_RAISE_ERROR( ePdfError_InvalidXRef );
> +   }
>
> .540 added
> +   --m_nReadNextTrailerLevel;
> }
>
> Best Regards
> Mark
>
> Mark Rogers - mark.rog...@powermapper.com PowerMapper Software Ltd - 
> www.powermapper.com Registered in Scotland No 362274 Quartermile 2 
> Edinburgh EH3 9GL
>
>
> --
> 
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. 
> Discussions will include endpoint security, mobile security and the 
> latest in malware threats. 
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Patch for performance issue

2012-07-15 Thread Mark Rogers
It goes into PdfParser::ReadDocumentStructure()

else
{
PdfError::LogMessage( eLogSeverity_Warning, "PDF Standard Violation: No 
/Size key was specified in the trailer directory. Will attempt to recover." );
// Treat the xref size as unknown, and expand the xref dynamically as we 
read it.
m_nNumObjects = 0;
}

// newcode start
// allow caller to specify a max object count to avoid very slow load times on 
large documents
if (s_nMaxObjects != LONG_MAX && m_nNumObjects > s_nMaxObjects)
PODOFO_RAISE_ERROR_INFO( ePdfError_ValueOutOfRange,  "m_nNumObjects is 
greater than m_nMaxObjects." );
// newcode end

if (m_nNumObjects > 0)
m_offsets.resize(m_nNumObjects);

Intention for this placement was doing the check before m_offsets was resized 
(which may allocated a large chunk of memory if m_nNumObjects is a big number)

PS Line numbers in my patches all refer to PoDoFo 0.9.1

Best Regards
Mark

-Original Message-
From: Dominik Seichter [mailto:domseich...@googlemail.com] 
Sent: 15 July 2012 08:40
To: Mark Rogers
Cc: podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] Patch for performance issue

Hi Mark,

I need some context again:

> .293 added
> // allow caller to specify a max object count to avoid very slow load 
> times on large documents
> if (s_nMaxObjects != LONG_MAX && m_nNumObjects > s_nMaxObjects)
> PODOFO_RAISE_ERROR_INFO( ePdfError_ValueOutOfRange, 
> "m_nNumObjects is greater than m_nMaxObjects." );

Which method should this go to?

Cheers,
 Dominik

On Thu, Jun 21, 2012 at 1:43 PM, Mark Rogers  
wrote:
> Hi
>
> A while back I posted about a problem loading a large PDF document into 
> PoDoFo. The document in question was fairly unusual (it's a 700 page list of 
> pharmacies in North America) but took 15 minutes to load and allocated 800MB 
> of working set before throwing an out of memory error.
>
> Problem is due to:
>
> a) large number of objects (about 450,000) in document
> b) short byte sequences in the source document turning into 40-100 
> byte PdfObjects in memory (which turns a 20MB document on disk into 
> 800MB in memory)
>
> There's no easy fix without major refactoring, and the document in question 
> is pretty unusual, so a workaround seems in order. The workaround provides a 
> way for the caller to specify max number of objects to load (an exception is 
> thrown if object limit is exceeded when reading header). If the caller 
> doesn't specify an object limit the behaviour is unchanged from previous 
> versions.
>
> PdfParser.h
>
> .370 added
>
>/**
>  * \return maximum object count to read (default is LONG_MAX
>  * which means no limit)
>  */
> inline static long GetMaxObjectCount();
>
> /**
>  * Specify the maximum number of objects the parser should
>  * read. An exception is thrown if document contains more
>  * objects than this. Use to avoid problems with very large
>  * documents with millions of objects, which use 500MB of
>  * working set and spend 15 mins in Load() before throwing
>  * an out of memory exception.
>  *
>  * \param nMaxObjects set max number of objects
>  */
> inline static void SetMaxObjectCount( long nMaxObjects );
>
> .538 added
> static long   s_nMaxObjects;
>
> .641 added
> // -
> //
> // -
> long PdfParser::GetMaxObjectCount()
> {
> return s_nMaxObjects;
> }
>
> // -
> //
> // -
> void PdfParser::SetMaxObjectCount( long nMaxObjects ) {
> s_nMaxObjects = nMaxObjects;
> }
>
> PdfParser.cpp
>
> .51 added
> long PdfParser::s_nMaxObjects = LONG_MAX;
>
> .293 added
> // allow caller to specify a max object count to avoid very slow load 
> times on large documents
> if (s_nMaxObjects != LONG_MAX && m_nNumObjects > s_nMaxObjects)
> PODOFO_RAISE_ERROR_INFO( ePdfError_ValueOutOfRange, 
> "m_nNumObjects is greater than m_nMaxObjects." );
>
> Best Regards
> Mark
>
> Mark Rogers - mark.rog...@powermapper.com PowerMapper Software Ltd - 
> www.powermapper.com Registered in Scotland No 362274 Quartermile 2 
> Edinburgh EH3 9GL
>
>
>
> --
> 
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. 
> Discussions will include endpoint sec

[Podofo-users] Patch for stack overflow

2012-06-27 Thread Mark Rogers
Found some more PDF documents in wild which cause problems - recursive stack 
overflow in this case due to circular cross references in the trailer. Worth 
saying that the library is generally very stable - but I'm pumping lots of PDFs 
from different sources through it so seeing some unusual edge cases.

Here's a patch that limits the recursion depth when reading the trailer

PdfParser.h
.577 added
+int   m_nReadNextTrailerLevel;

PdfParser.cpp
void PdfParser::Init() 
{
.127 added
+   m_nReadNextTrailerLevel = 0;
}

PdfParser::ReadNextTrailer()
{
.493 added
+   // be careful changing this limit - overflow limits depend on the OS, 
linker settings, and how much stack space compiler allocates
+   // 500 limit prevents overflow on Win7 with VC++ 2005 with default 
linker stack size (1000 caused overflow with same compiler/OS)
+   const int maxReadNextTrailerLevel = 500;
+
+   ++m_nReadNextTrailerLevel;
+
+   if ( m_nReadNextTrailerLevel > maxReadNextTrailerLevel )
+   {
+   // avoid stack overflow on documents that have circular cross 
references in trailer
+PODOFO_RAISE_ERROR( ePdfError_InvalidXRef );
+   }

.540 added
+   --m_nReadNextTrailerLevel;
}

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patches for access violations

2012-06-21 Thread Mark Rogers
Hi

Some PDF documents found in the wild cause NULL pointer dereferences in PoDoFo, 
but load correctly in Chrome/Adobe Reader, so here are some patches to fix 
these:

PdfPage.cpp
unsigned int PdfPage::GetPageNumber() const

.494
-   const PdfArray& kids= pParent->GetIndirectKey( "Kids" 
)->GetArray();

+   PdfObject* pKids = pParent->GetIndirectKey( "Kids" );
+   if ( pKids != NULL )
+   {
+   const PdfArray& kids= pKids->GetArray();


.501
-   if( pNode->GetDictionary().GetKey( PdfName::KeyType )->GetName() == 
PdfName( "Pages" ) )
+   if( pNode->GetDictionary().GetKey( PdfName::KeyType ) != NULL && 
pNode->GetDictionary().GetKey( PdfName::KeyType )->GetName() == PdfName( 
"Pages" ) )


.510
+   }

PdfPagesTree.cpp
PdfObject* PdfPagesTree::GetPageNode

.230
-   if( !pObj->IsArray() )
+   if( pObj == NULL || !pObj->IsArray() )

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patch for performance issue

2012-06-21 Thread Mark Rogers
Hi

A while back I posted about a problem loading a large PDF document into PoDoFo. 
The document in question was fairly unusual (it's a 700 page list of pharmacies 
in North America) but took 15 minutes to load and allocated 800MB of working 
set before throwing an out of memory error.

Problem is due to:

a) large number of objects (about 450,000) in document
b) short byte sequences in the source document turning into 40-100 byte 
PdfObjects in memory (which turns a 20MB document on disk into 800MB in memory)

There's no easy fix without major refactoring, and the document in question is 
pretty unusual, so a workaround seems in order. The workaround provides a way 
for the caller to specify max number of objects to load (an exception is thrown 
if object limit is exceeded when reading header). If the caller doesn't specify 
an object limit the behaviour is unchanged from previous versions.

PdfParser.h

.370 added

   /**
 * \return maximum object count to read (default is LONG_MAX
 * which means no limit)
 */
inline static long GetMaxObjectCount();

/**
 * Specify the maximum number of objects the parser should
 * read. An exception is thrown if document contains more
 * objects than this. Use to avoid problems with very large 
 * documents with millions of objects, which use 500MB of 
 * working set and spend 15 mins in Load() before throwing 
 * an out of memory exception.
 *
 * \param nMaxObjects set max number of objects
 */
inline static void SetMaxObjectCount( long nMaxObjects );

.538 added
static long   s_nMaxObjects;

.641 added
// -
// 
// -
long PdfParser::GetMaxObjectCount()
{
return s_nMaxObjects;
}

// -
// 
// -
void PdfParser::SetMaxObjectCount( long nMaxObjects )
{
s_nMaxObjects = nMaxObjects;
}

PdfParser.cpp

.51 added
long PdfParser::s_nMaxObjects = LONG_MAX;

.293 added
// allow caller to specify a max object count to avoid very slow load times 
on large documents
if (s_nMaxObjects != LONG_MAX && m_nNumObjects > s_nMaxObjects)
PODOFO_RAISE_ERROR_INFO( ePdfError_ValueOutOfRange, "m_nNumObjects is 
greater than m_nMaxObjects." );

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] High CPU and memory consumption for 11MB PDF

2012-04-27 Thread Mark Rogers
Hello

Currently using 0.9.1 on windows - I have a problem with the following 11 MB 
document:
https://www.bcbsri.com/BCBSRIWeb/plansandservices/pdf/NationalNetworkPharmacies.pdf

When this is loaded using PdfMemDocument::Load it takes 15 minutes to load at 
50% CPU and allocates about 800 MB of RAM. Same issue is happening on a Windows 
7 desktop system and Windows 2008 server. Memory consumption drops by 800MB 
when ~PdfMemDocument destructor is called so doesn't look like a leak.

It's a 700 page document with a lot of objects in it - PdfParser.m_nNumObjects 
= 469094 

(bLoadOnDemand = true in PdfParser::ParseFile if that's important)

Has anyone any insights into the problem? I'm happy to dig in and provide a 
patch if I can work out what's wrong.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] A Question about PDF Form rendering

2011-06-16 Thread Mark Rogers
Hi

I needed to get access to the PDF document language id (the equivalent of ) and have the following patch for PdfMemDocument.h

/** Get access to the RFC 3066 natural language id for the document (ISO 
32000-1:2008 14.9.2.1)
 *  \returns PdfObject the language ID string
 */
PdfObject* GetLanguage() const { return GetNamedObjectFromCatalog( "Lang" 
); }

Do simple accessors like this need a unit test?

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com 
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL 




--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Potential problem with Windows DLL

2010-06-25 Thread Mark Rogers
Hi

I'm not sure if this a problem or just a restriction. There are heap corruption 
problems with std::strings passed across the PoDoFo interface when the 
following conditions are true:


-  PoDoFo is compiled as a Windows DLL

-  The shared C runtime DLL isn't being used for both caller and PoDoFo 
(i.e. not an issue when both use the shared runtime)

Here's an example of what goes wrong:

Caller_method()
{
PoDoFo::TKeyMap& keys = token.GetDictionary().GetKeys();
for ( PoDoFo::TKeyMap::iterator it = keys.begin(); it != keys.end(); ++it )
{
PoDoFo::PdfObject* pValue = it->second;
string strValue;  // no memory allocated for strValue.c_str() yet

// strValue is allocated in PoFoFo!_crtheap by ToString
// memory only allocated if strValue.size() > 16 (string::char[16] 
used for short strings)
pValue->ToString( strValue );

// strValue destructor fires and tries to free strValue.c_str() from 
caller!_crtheap using
// HeapFree(_crtheap, strValue.data() )
// this only works when the both caller and callee share same CRT heap (i.e. 
both use shared DLL CRT)
}
}

Simple assignments suffer the same issue in Visual C++ 6.0, since std:string 
shares ref counted string buffers between std::string instances (may not be 
many people left using that though).

It looks like podofo_malloc and podofo_free avoid this issue for explicitly 
allocated memory,  but something similar would need done for std::string (a 
custom STL allocator should do the trick, but that could break a lot of 
existing code).

The other alternative is to require apps to use the shared CRT, but that pulls 
in other dependencies (e.g. installer merge modules and the broken SxS 
deployment model)

Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - http://www.powermapper.com
Quartermile 2 Edinburgh EH3 9GL Registered in Scotland No 362274
Phone +44 845 056 8475


--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Fixes for access violations

2010-06-02 Thread Mark Rogers
Hi

I've got a couple of fixes for access violations in PDFs found in the wild:

PdfPagesTree.cpp
PdfPagesTree::GetPage crashes on PDFs where the value of the "Count" key 
returned by GetTotalNumberOfPages is bigger than the actual number of pages in 
the tree.
Changes:
 97 Change: if( pPage->GetObject()->Reference() == ref )
To: if( pPage != NULL && pPage->GetObject()->Reference() == ref 
)

PdfAnnotation.cpp
Out by one error in s_lNumActions (count includes terminating NULL entry, which 
looks wrong and faults in PdfElement::TypeNameToIndex if name isn't found). Bug 
fires if an annotation name doesn't match any of the annotation actions in the 
table.
Changes:
 35 Change: const long  PdfAnnotation::s_lNumActions = 26;
To : const long  PdfAnnotation::s_lNumActions = 25;

BTW The same issue affects PdfAction::s_lNumActions

PdfElement.cpp
PdfElement::TypeNameToIndexfaults if ppTypes[] contains sentinel NULL entries 
and name isn't found
Changes:
98 Change: if( strcmp( pszType, ppTypes[i] ) == 0 )
To: if( ppTypes[i] != NULL && strcmp( pszType, ppTypes[i] ) == 
0 )

Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - http://www.powermapper.com
Quartermile 2 Edinburgh EH3 9GL Registered in Scotland No 362274
Phone +44 845 056 8475


--

___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] PODOFO_API and podofo_free in 0.8.0

2010-04-30 Thread Mark Rogers
My reading of the 0.8.0 documentation is that podofo_free should be used by 
callers to release memory allocated by podofo_alloc (e.g. the buffer returned 
by GetFilteredCopy)

This is pretty much essential when podofo is used as a Windows DLL since 
calling C runtime free() on the returned buffer isn't guaranteed to work across 
DLL boundaries.

When compiled as a static library everything works, but when compiled as a 
Windows DLL there are linker errors because podofo_free isn't exported as a 
public symbol (because there are no PODOFO_API prefixes on podofo_free, 
podofo_realloc and podofo_malloc)

If my understanding is correct here's a patch:

PdfMemoryManagement.h - add PODOFO_API
PODOFO_API void* podofo_malloc( size_t size );
PODOFO_API void* podofo_realloc( void* buffer, size_t size );
PODOFO_API void podofo_free( void* buffer );

PdfMemoryManagement.cpp
Add #include "PdfDefines.h" before first include.

Only tested on Visual Studio - don't have a linux tool chain currently (but 
looks correct after a quick scan of the GCC documentation).

PS Congratulations on getting the 0.8.0 release out - worked very smoothly 
apart from this issue.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - http://www.powermapper.com 



--
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Possible bug: error thrown reading PDF version of "ISO Standard 32000: Portable Document Format"

2010-03-10 Thread Mark Rogers
There may be an issue in v0.7.0 of the library (sorry - don't have a trunk 
version from SVN currently, so unable to check that). 

There's a PDF on adobe.com which is a copy of ISO Standard 32000: Portable 
Document Format. When the document is read with PoDoFo on Windows an error is 
thrown:
http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf

Could well be a problem with the document itself, but given the status of the 
document users are likely to assume the document is correct.

Here's a stack trace for source of the exception:

_CxxThrowException(void * pExceptionObject=0x1127e098, const _s__ThrowInfo * 
pThrowInfo=0x012ef488)  Line 161   C++
PoDoFo::PdfTokenizer::GetNextNumber()  Line 301 C++
PoDoFo::PdfParserObject::ReadObjectNumber()  Line 92 + 0xb bytesC++
PoDoFo::PdfParserObject::ParseFile(PoDoFo::PdfEncrypt * pEncrypt=0x, 
bool bIsTrailer=false)  Line 131   C++
PoDoFo::PdfParser::ReadObjectsInternal()  Line 947  C++
PoDoFo::PdfParser::ReadObjects()  Line 930  C++
PoDoFo::PdfParser::ParseFile(const PoDoFo::PdfRefCountedInputDevice & 
rDevice={...}, bool bLoadOnDemand=true)  Line 191 + 0x8 bytes C++
PoDoFo::PdfParser::ParseFile(const wchar_t * pszFilename=0x03825600, bool 
bLoadOnDemand=true)  Line 155 C++
PoDoFo::PdfParser::PdfParser(PoDoFo::PdfVecObjects * pVecObjects=0x09014300, 
const wchar_t * pszFilename=0x03825600, bool bLoadOnDemand=true)  Line 72  C++
PoDoFo::PdfMemDocument::Load(const wchar_t * pszFilename=0x03825600)  Line 160 
+ 0x3d bytes C++
PoDoFo::PdfMemDocument::PdfMemDocument(const wchar_t * pszFilename=0x03825600)  
Line 74 C++

Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - http://www.powermapper.com 
Registered in Scotland No 362274 30-31 Queen Street Edinburgh
Phone +44 845 056 8475




--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] MarkInfo accessor

2010-03-08 Thread Mark Rogers
It would be useful if PoDoFo provided a PdfMemDocument property to access the 
MarkInfo dictionary used for storing information about PDFs tagged for 
accessibility.

The property is fairly easy to add to PdfMemDocument.h. Here's the code:

/** Get access to the MarkInfo dictionary (ISO 32000-1:2008 14.7.1)
 *  \returns PdfObject the MarkInfo dictionary
 */
PdfObject* GetMarkInfo() const { return GetNamedObjectFromCatalog( 
"MarkInfo" ); }

Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - http://www.powermapper.com 
Registered in Scotland No 362274 30-31 Queen Street Edinburgh
Phone +44 845 056 8475




--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Unterminated string causes PdfInputDevice::PdfInputDevice to fail at random

2010-02-14 Thread Mark Rogers
The PdfInputDevice constructor in trunk may fail because the filename is copied 
to a buffer (pStr) which isn't null terminated.

PdfInputDevice::PdfInputDevice( const wchar_t* pszFilename )
{
this->Init();

if( !pszFilename ) 
{
PODOFO_RAISE_ERROR( ePdfError_InvalidHandle );
}

try {
//m_pStream = new std::ifstream( pszFilename, std::ios::binary );
size_t strLen = wcslen(pszFilename);
char * pStr = new char[strLen+1];
wcstombs(pStr, pszFilename, strLen);

I think the last line should read:

wcstombs(pStr, pszFilename, strLen+1);

Regards
Mark

Mark Rogers - mark.rog...@electrum.co.uk
Electrum Multimedia Ltd - http://www. electrum.co.uk 
Registered in Scotland No 158435 Registered Office 50 Lothian Road 



--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] GetStructTreeRoot returns null on some tagged PDF documents?

2009-06-23 Thread Mark Rogers
There seems to be some sort of tagged text in there:

- the Read Out Loud feature of Adobe Reader does a good job of reading out the 
document and synchronising the reading to highlighted text on the document
- the online PDF to HTML converter at Adobe gets all the document structure 
right (including headings, tables and bulleted lists) when converting to HTML
  http://www.adobe.com/products/acrobat/access_onlinetools.html

I don't think either of these would work well without tagged structure.

All the documents I've noticed this issue with are quite recent (and mostly 
produced by Adobe). Is there a new version of the PDF format doing the rounds 
or is a recent version of Acrobat outputting stuff in an unexpected order?

Regards
Mark



--
___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] GetStructTreeRoot returns null on some tagged PDF documents?

2009-06-23 Thread Mark Rogers
I've been trying to use PoDoFo to extract accessible text from PDFs.

 

I have some PDF documents tagged for accessibility which show as tagged in
Adobe Reader properties (Tagged: Yes), but PoDoFo::PdfDocument::
GetStructTreeRoot returns null. My (limited) understanding of ISO 32000 is
that the tagged text all lives under StructTreeRoot.

 

When I open one of the problem PDFs in PoDoFoBrowser it shows a reference
for StructTreeRoot, but the referenced node can't be expanded to show its
contents.  The StructTreeRoot node can be expanded in other tagged PDFs.

 

An example of a problem document is:

http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf

 

Any pointers or suggestions would be gratefully accepted.

 

Regards

Mark Rogers - mark.rog...@electrum.co.uk

 

--
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Extracting Accessible Text

2009-03-19 Thread Mark Rogers
Hi

 

I'm trying to figure out how to extract text from a PDF into an
accessibility tool

 

I've figured out how to walk the tagged structure returned by
GetStructTreeRoot, but stuck on how to get from an integer marked content
identifier (PDF 32000 14.7.2) to the actual text.

 

Looks like I probably need to parse the content stream using
PdfContentsTokenizer to scan for the corresponding marked content.

 

Is this the right thing to do? Is there an easier way to do this?

 

Thanks

Mark

 

--
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users