On 5/11/20, Oleg Broytman <p...@phdru.name> wrote:
> On Mon, May 11, 2020 at 09:12:52PM -0000, Steve Jorgensen
> <ste...@stevej.name> wrote:
>
>> When the platform is Windows, certainly, "<letter>:" should not be
>> allowed, and perhaps colon should not be allowed at all.

The meaning of "<letter>:name" is context dependent. If it occurs at
the beginning of a path, it's relative to the working directory on
drive "<letter>:", which defaults to the root directory on the drive.
For example, if the working directory on drive "X:" is "X:\spam\eggs",
then "X:foo" resolves to "X:\spam\eggs\foo". "X:foo" in this context
is not a valid component name; it's actually a filepath.

Otherwise "<letter>:" is part of an NTFS or ReFS stream path, where
":" is the stream delimiter. To be valid, it needs to be followed by
either the name of the stream or the name plus the type, e.g.
"filename:streamname" or "filename:streamname:streamtype".

Should file streams be supported?

More on File Streams

An open or create will fail as an invalid filename if it uses invalid
stream syntax or references a stream type that's unknown, or if the
filesystem doesn't support streams and disallows colon in filenames
(e.g. FAT32).

The stream name can be empty to indicate an anonymous or default
stream, but only if the stream type is specified. For example, in NTFS
"filename::$DATA" is the anonymous data stream in a file named
"filename". For a regular data file, it's the same as just accessing
"filename".

A directory can have named data streams, but it cannot have an
anonymous data stream. The default stream in a directory is an index
stream named "$I30". The following are equivalent names for a
directory in NTFS: "dirname", "dirname::$INDEX_ALLOCATION", and
"dirname:$I30:$INDEX_ALLOCATION". But "dirname:$I30" doesn't work
because the default stream type is $DATA.

To access a stream in a single-letter filename relative to the current
directory, the current directory has to be referenced explicitly via
the "." component. For example, "./C:spam" is a stream named "spam" in
a file named "C" that's in the current working directory, but "C:spam"
is a file named "spam" in the working directory on drive "C:".

>    Forbidden characters:
>
> chr(0) < > : " / \ | ? *
>
> characters in range from chr(1) through chr(31),

See the above discussion regarding ":". An NTFS stream name can
include any character except for nul (0), colon, backslash, and slash.

The characters *?"<> are the 5 wildcards characters that almost all NT
filesystems disallow in filenames. These are important to disallow
because the filesystem driver (in the kernel) is expected to support
filtering a directory listing with a wildcard pattern. NT's * and ?
wildcards have Unix shell semantics. The other three are DOS_DOT ("),
DOS_STAR (<), and DOS_QM (>), which help to emulate MS-DOS behavior.

The vertical bar or pipe (|) has no significance in filepaths, but
it's a special shell character that's usually disallowed in filenames.
Control characters 1-31 usually are also disallowed. That said, some
non-Microsoft filesystems may allow these characters. For example, the
VirtualBox shared-folder filesystem allows pipe and control characters
in filenames.

> a space or a period at the end of file/directory name.

Trailing spaces and dots are stripped from the final path component in
almost all contexts. Except "\\?\" device paths are never normalized
in an open or create context. For example, creating "\\?\C:\Temp\spam.
. . " will name the file "spam. . . " instead of the normal name
"spam". The name "spam. . . " will appear in the directory listing,
but opening it will require using a "\\?\" device path.

> Forbidden file names (with any extensions):
>
> CON, PRN, AUX, NUL,
> COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9,
> LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.

In an attempt to replicate how MS-DOS implemented devices, Windows
reserves DOS device names such as "NUL" in the final component of DOS
drive-letter paths and relative paths. They are not reserved in the
final component of UNC and device paths, though a server may disallow
them by policy, as Microsoft's SMB server does.

Matching the device name ignores everything after a trailing colon or
dot that follows the name with 0 or more intervening spaces. This is
more than ignoring an extension, which is typically taken as the
characters following the last dot in a filename.

"CONIN$" and "CONOUT$" are mistakenly excluded from the documented
list of reserved DOS device names. Windows has always reserved them as
unqualified relative names in a create/open context. Starting with
Windows 8, they're reserved exactly the same as the classic DOS device
names.

Examples with trailing dots and spaces:

    >>> os.getcwd()
    'C:\\'
    >>> nt._getfullpathname('spam. . . ')
    'C:\\spam'
    >>> nt._getfullpathname('foo/spam. . . ')
    'C:\\foo\\spam'

DOS devices:

    >>> nt._getfullpathname('conin$:spam.eggs')
    '\\\\.\\conin$'
    >>> nt._getfullpathname('foo/conin$  .spam.eggs')
    '\\\\.\\conin$'

Non-final component:

    >>> nt._getfullpathname('spam. . . /foo')
    'C:\\spam. . . \\foo'
   >>> nt._getfullpathname('conin$/foo')
    'C:\\conin$\\foo'

UNC and device paths:

    >>> nt._getfullpathname('//server/share/conin$')
    '\\\\server\\share\\conin$'
    >>> nt._getfullpathname('//./C:/conin$')
    '\\\\.\\C:\\conin$'
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GUDBXZOFLGKTM2F5233F3UW5UP2BBRPK/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to