On 2/27/2017 12:11 AM, zyx wrote:

I still think that PoDoFo does things according to ISO 32000:2008 (see
the notes at the end of page 101 and the text around).
        It seems that the 7.11.2.3 "Conversion to Platform-Dependent File 
Names" section of the standard is not the requirements for a PDF writer application, 
but rather an explanation of how the FileSpec's platform-independent file paths are 
expanded on different platforms. Then shouldn't the charset limitations described in the 
Note 1 just demonstrate what characters are definitely safe on any given platform?
        This would mean that the requirement for the platform independent paths are specified in sections 
7.11.2.1, 7.11.2.2 and 7.11.2.3. Those sections however do not explicitly specify any character restrictions 
for file paths (there are restrictions for URLs, which follow a bit different set of rules). Moreover as per 
section 7.11.3 table 44 both "F" and "UF" keys are subject to section 7.11.2, but there 
is no handling for "UF" key in PdfFileSpec.

        Rules concerning the path specification:
 - '/' is the only path separator (can be made part of a filename by escaping it with 
"\\" i.e. writing it as \\/)
 - "/filepath/filename" is absolute path; "filepath/filename" is relative path
 - ".." is used in relative paths for moving up a level in the file system 
hierarchy
 - there are not restrictions on how the path is represented as long as it can be expanded to a 
correct absolute path (e.g. both "../a/b/../../c/d.txt" and "../c/d.txt" can be 
used )
        The above seems to call not for simple character substitution, but for 
path handling.

that 2E is a Unicode variant of a dot and it's understood by an Acrobat
Reader
        From section 7.11.3 table 44 the "F" key is specified for backwards compatibility. Doesn't 
this mean that the difference between "F" and "UF" keys is basically that the former is 
in ASCII and the later is in UTF-16
So following that shouldn't any limitations concern only characters not 
representable in 7bit ASCII?

        Just to see how Adobe SDK handles FileSpec (not for any practical use, 
but out of curiosity). I used Adobe Acrobat Pro and added two new resources to 
an existing Flash annotation:
 - "подофо\text.123.4df.txt"
 -"подофо\`-=~!@#$% ^&()_+[]{}';.,"


This results in creating the following FileSpecs (plain text from the pdf file):

67 0 obj
<</EF<</F 70 0 R>>/F(....../`-=~!@#$% ^&\(\)_+[]{}';.,)/Type/Filespec/UF(??>4>D> / 
` - = ~ ! @ # $ %   ^ & \( \) _ + [ ] { } ' ; . ,)>>
endobj
68 0 obj
<</EF<</F 69 0 R>>/F(....../text.123.4df.txt)/Type/Filespec/UF(??>4>D> / t e x 
t . 1 2 3 . 4 d f . t x t)>>
endobj

The "UF" keys are left in UTF-16 and in the "F" key the symbols unrepresentable 
in ASCII are replaced by dots. So it seems that the FileSpec might be able to function as long as 
it's file path does not contain any characters illegal for use in the given platform's file name. 
From this follows the idea that the contents of Section 7.11.2.4 Note 1 describes very conservative 
list of characters safe for use on any given platform.

        As for URLs - they conform to RFC 1808 (unsafe symbols are specified 
and escaped according to RFC 1738 or more recent RFC 3986).

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to