On 2/27/2017 12:11 AM, zyx wrote:
I still think that PoDoFo does things according to ISO 32000:2008 (see
the notes at the end of page 101 and the text around).
It seems that the 7.11.2.3 "Conversion to Platform-Dependent File
Names" section of the standard is not the requirements for a PDF writer application,
but rather an explanation of how the FileSpec's platform-independent file paths are
expanded on different platforms. Then shouldn't the charset limitations described in the
Note 1 just demonstrate what characters are definitely safe on any given platform?
This would mean that the requirement for the platform independent paths are specified in sections
7.11.2.1, 7.11.2.2 and 7.11.2.3. Those sections however do not explicitly specify any character restrictions
for file paths (there are restrictions for URLs, which follow a bit different set of rules). Moreover as per
section 7.11.3 table 44 both "F" and "UF" keys are subject to section 7.11.2, but there
is no handling for "UF" key in PdfFileSpec.
Rules concerning the path specification:
- '/' is the only path separator (can be made part of a filename by escaping it with
"\\" i.e. writing it as \\/)
- "/filepath/filename" is absolute path; "filepath/filename" is relative path
- ".." is used in relative paths for moving up a level in the file system
hierarchy
- there are not restrictions on how the path is represented as long as it can be expanded to a
correct absolute path (e.g. both "../a/b/../../c/d.txt" and "../c/d.txt" can be
used )
The above seems to call not for simple character substitution, but for
path handling.
that 2E is a Unicode variant of a dot and it's understood by an Acrobat
Reader
From section 7.11.3 table 44 the "F" key is specified for backwards compatibility. Doesn't
this mean that the difference between "F" and "UF" keys is basically that the former is
in ASCII and the later is in UTF-16
So following that shouldn't any limitations concern only characters not
representable in 7bit ASCII?
Just to see how Adobe SDK handles FileSpec (not for any practical use,
but out of curiosity). I used Adobe Acrobat Pro and added two new resources to
an existing Flash annotation:
- "подофо\text.123.4df.txt"
-"подофо\`-=~!@#$% ^&()_+[]{}';.,"
This results in creating the following FileSpecs (plain text from the pdf file):
67 0 obj
<</EF<</F 70 0 R>>/F(....../`-=~!@#$% ^&\(\)_+[]{}';.,)/Type/Filespec/UF(??>4>D> /
` - = ~ ! @ # $ % ^ & \( \) _ + [ ] { } ' ; . ,)>>
endobj
68 0 obj
<</EF<</F 69 0 R>>/F(....../text.123.4df.txt)/Type/Filespec/UF(??>4>D> / t e x
t . 1 2 3 . 4 d f . t x t)>>
endobj
The "UF" keys are left in UTF-16 and in the "F" key the symbols unrepresentable
in ASCII are replaced by dots. So it seems that the FileSpec might be able to function as long as
it's file path does not contain any characters illegal for use in the given platform's file name.
From this follows the idea that the contents of Section 7.11.2.4 Note 1 describes very conservative
list of characters safe for use on any given platform.
As for URLs - they conform to RFC 1808 (unsafe symbols are specified
and escaped according to RFC 1738 or more recent RFC 3986).
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users