Hello everyone,
An application I am developing is capable of
extracting text data from almost every PDF document, and for this to
happen I have to parse a font's ToUnicode stream which contains a
CMap.
In a small percentage of files (3 out of 2000 in my
test batch), the cmap looks like this:
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo <<
/Registry (Arial,Bold+0) def
/Ordering (T1UV) def
/Supplement 0 def
>> def
/CMapName /Arial,Bold+0 def
1 begincodespacerange <20> <ff> endcodespacerange
31 beginbfrange ....
12 dict begin
begincmap
/CIDSystemInfo <<
/Registry (Arial,Bold+0) def
/Ordering (T1UV) def
/Supplement 0 def
>> def
/CMapName /Arial,Bold+0 def
1 begincodespacerange <20> <ff> endcodespacerange
31 beginbfrange ....
The embedded
dictionary contains the word def which trails each entry, and
that makes the dictionary unreadable by my parser, since it expects the strict
<< /Name Value >> structure, and this nasty little word
breaks it.
My questions are: what exactly does the word
def mean, and what's it doing inside a dictionary? Can someone
shed some light?
Any help would be greatly appreciated!
Peter
