Edit report at https://bugs.php.net/bug.php?id=63769&edit=1
ID: 63769
Comment by: hanskrentel at yahoo dot de
Reported by: hanskrentel at yahoo dot de
Summary: file:// protocol does not support percent-encoded
characters
Status: Not a bug
Type: Bug
Package: Streams related
Operating System: Windows
PHP Version: 5.4.9
Block user comment: N
Private report: N
New Comment:
Pierre, not helpful. Should I say "as usual"?
I explain you briefly, so you can see how easily you fool yourself:
You point to some standard reference (here to the MSDN) to make up the argument
that % can be part of a file-name. I never neglected that. So what do you want
to say with that link? Probably that there is some standardization in
file-names inside an OS?
Well that's fine.
My point is that some standard with the URI standardization is not properly
implemented in PHP. A very common standard btw. despite the file:// URI is not
really standardized, URI and percent encoding *is*.
Now you bring in some other standard. You probably wanted to create the
impression that PHP itself would actually follow that standard, but what should
I tell you: Naturally like not following the URI standard (as pointed out in
this issue), the Windows rules for valid file names aren't properly implemented
either (!!!).
But this is not what my bug-report is about.
Or was it that you just wanted to give the example that PHP does not even needs
the file-system file-naming rules because it makes it's own ones? That it does
not have to follow these, because it's superior?
Previous Comments:
------------------------------------------------------------------------
[2013-01-16 17:40:40] [email protected]
it is your job to decode it, file:// does not have and does not follow the %
used
in other areas.
btw, paths on windows can contain the %, see http://msdn.microsoft.com/en-
us/library/windows/desktop/aa365247(v=vs.85).aspx for a list of not allowed
characters.
------------------------------------------------------------------------
[2013-01-16 16:18:41] hanskrentel at yahoo dot de
@ab:
Consider you have a file containing a space in the filename, and you *need to*
specify the filename in form of a file:// URI for which space is a special
character that needs proper URL-encoding.
That file://-URI btw is set in an environemnt variable that requires it
(XML_CATALOG_FILES).
Domdocument in PHP internally is then using that file://-URI and can't process
it properly because the wrapper is not able to properly decode the path
information.
You actually pretty well demonstrate the problem in your example:
php -r "echo file_get_contents('file://C:/my/path/catalog%202.xml');"
percent filename
Is obviously wrong. %20 in a file://-URI is an ecoded space, so the content
space filename
needs to be output instead. The filename you meant is properly written as:
php -r "echo file_get_contents('file://C:/my/path/catalog%25202.xml');"
percent filename
compare: http://tools.ietf.org/html/rfc3986#section-2.1
Please add that example to yours because only if you have the two opposite
cases (encoded *and* decoded) you can actually work out concrete results. You
are just having two times the same example,
of which I think both shows the
same form of wrong: Missing encoding in those URIs.
Which brings me to the point: Is there actually any interest to fix this? I
mean there is not much standing in the way if you ask me. Normally users are
not using the file:// URIs at all.
Those who did most likely used the space (or would have complained earlier
here, but I could find no bug-report). The only edge-case I can see is with
files containing percent-signs, however how
likely is that at all?
Let me know if I would sponsor some well written patch how the chances would be
to get this fixed.
------------------------------------------------------------------------
[2013-01-08 17:12:54] [email protected]
@hanskrentel
That's my test:
- create file 'catalog%202.xml' with content "percent filename"
- create file 'catalog 2.xml' with content "space filename"
- then run
php -r "echo file_get_contents('file://C:/my/path/catalog%202.xml');"
percent filename
- then run
php -r "echo file_get_contents('file://C:/my/path/catalog 2.xml');"
space filename
That's pretty straight forward. That's what I mean - no decoding, both are
valid filenames. The decoding should be done in your app depending on what it
needs. In your example - you create 'catalog 2.xml' and are trying to stat
'catalog%202.xml', literally. But 'catalog%202.xml' doesn't exist.
------------------------------------------------------------------------
[2013-01-06 07:03:56] anon at anon dot anon
Actually, hold on a sec, plus signs are *not* supposed to be decoded here. That
means that file names containing plus signs would not be broken by a fix, and
only file names containing a '%xx' (where x is a hexit) sequence would be
affected, which is probably uncommon. Perhaps you have a chance.
------------------------------------------------------------------------
[2013-01-06 06:38:45] anon at anon dot anon
>You would have wanted to access it via 'file:///C:/temp/catalog%%25202.xml'
Actually, 'catalog%25202.xml', but I know, I'm agreeing with you. I'm just
pointing out that this erroneous behavior may be depended on somewhere in some
PHP script, where the author, in good faith, did whatever made things work. I
assume you're going to pass your path through urldecode (or not encode it in
the first place), and then you'll be one of them.
In any case, you're unlikely to get any support here. The reviewers here don't
do much except dismiss things as 'Not a bug' and once they've successfully done
that they lose interest. C'est le PHP.
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
https://bugs.php.net/bug.php?id=63769
--
Edit this bug report at https://bugs.php.net/bug.php?id=63769&edit=1