Hi

I think I’ve got to the root cause of the reported problems in PdfString.

PdfStrings can be in 2 states: valid or invalid:

    /** The string is valid if no error in the constructor has occurred.
     *  If it is valid it is safe to call all the other member functions.
     *  \returns true if this is a valid initialized PdfString
     */
    inline bool IsValid() const;

The default PdfString constructor deliberately constructs an invalid string - 
this is used for things like PdfString::StringNull (which is different from an 
empty string) and is returned by various methods like 
PdfInfo::GetStringFromInfoDict, PdfField::GetFieldName, 
PdfField::GetAlternateName. There are other PdfString constructors that also 
create an invalid string: PdfString( (char*)NULL ) for example.

When IsValid() returns false various undefined behaviours occur if an invalid 
PdfString is used: 

- GetLength / GetUnicodeLength / GetCharacterLength return -1 or -2
- ToUnicode faults accessing a NULL pointer
- PdfEncoding::ConvertToUnicode - tries to allocate (SIZE_MAX-1)/2 or 
(SIZE_MAX-2)/2 bytes and throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfSimpleEncoding::ConvertToUnicode - tries to allocate SIZE_MAX-1 bytes and 
throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfIdentityEncoding::ConvertToEncoding - tries to allocate SIZE_MAX-1 bytes 
and throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfDifferenceEncoding::ConvertToUnicode tries to allocate SIZE_MAX-1 or 
SIZE_MAX-2  bytes and throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfDifferenceEncoding::ConvertToEncoding tries to allocate SIZE_MAX-1 or 
SIZE_MAX-2 bytes and throws ePdfError_OutOfMemory if GetCharacterLength < 0
- PdfString comparison and equality operators may fault or throw 
ePdfError_OutOfMemory if they try to convert the encoding of one of the operands

I think the problems happen because none of the PoDoFo code checks 
PdfString::IsValid, apart from PdfString::GetStringUtf8. I would guess the same 
is true of most PoDoFo client code.

The patch makes PdfString methods have document well-defined safe behaviour if 
IsValid() returns false:

- PdfString::GetLength / PdfString::GetUnicodeLength / 
PdfString::GetCharacterLength return 0 (this prevents allocations of SIZE_MAX-1 
or SIZE_MAX-2)
- PdfString::ToUnicode returns an invalid string if it’s called on an invalid 
string
- the < and > operators return false if LHS and/or RHS are invalid 
- the == operator return false if either LHS or RHS are invalid 
- the == operator return true if both LHS and RHS are invalid 

The patch is designed to only change behaviour when the current behaviour is 
bad (i.e. access faults or out of memory errors). Where the current behaviour 
is reasonable there are no changes other than documenting the behaviour.

Best Regards
Mark

Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

Attachment: patch-pdfstring-20160528.diff
Description: patch-pdfstring-20160528.diff

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to