Hi all
I've been working with PDFName in my code and have run into a bit of an
oddity I was hoping for comments on.
For any given string `fred', the operation:
( new PDFName(fred) ).getName().equals(fred)
isn't guaranteed to be true, because PDFName.getName() returns the
*escaped* name. It strips the leading slash added by
PDFName.escapeName(), so most of the time the returned name will be the
same, but it's a good candidate for creating exciting bugs.
I'd like to be able to use PDFName instead of String as a map key (for
clarity, mostly), but need to be able to get the original encapsulated
string quickly (without decoding) and reliably.
I'd like to change PDFName so that it keeps a reference to the original
name string and returns that from getName(). It should encode the name
on the first call to the new getEncodedName() method, storing it in a
local member, so short-lived PDFName objects don't waste time encoding
strings. I'd also like to have getEncodedName() return a byte[] not a
String, since an encoded PDF name isn't actually text data.
BTW, is there any reason Fop's PDF library uses java.lang.String when
working with sequences of PDF data bytes? For example, the output of
PDFName.escapeName(...) isn't really a "string" at all, in that it's not
meaningful text in any encoding, it's just a byte sequence jammed into
the lower 8 bits of unicode code points. It's pretty confusing having it
as a String (logically an array of unicode characters) rather than as a
byte[]. Right now, fop also writes 8-bit characters in names incorrectly
- the toHex(...) and PDFName.escapeName(...) methods translate values
between 128 and 255 inclusive of each *unicode* *character* in a String
to hex and write that out. This is incorrect, because PDF names should
be UTF-8, so it should be encoding to a UTF-8 byte sequence then escaping.
--
Craig Ringer
POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/