Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
> On 2020-01-23 18:04, Robert Haas wrote:
>> Now, you might say "well, why don't we just do an encoding
>> conversion?", but we can't. When the filesystem tells us what the file
>> names are, it does not tell us what encoding the person who created
>> those files had in mind. We don't know that they had*any*  encoding in
>> mind. IIUC, a file in the data directory can have a name that consists
>> of any sequence of bytes whatsoever, so long as it doesn't contain
>> prohibited characters like a path separator or \0 byte. But only some
>> of those possible octet sequences can be stored in a manifest that has
>> to be valid UTF-8.

> I think it wouldn't be unreasonable to require that file names in the 
> database directory be consistently encoded (as defined by pg_control, 
> probably).  After all, this information is sometimes also shown in 
> system views, so it's already difficult to process total junk.  In 
> practice, this shouldn't be an onerous requirement.

I don't entirely follow why we're discussing this at all, if the
requirement is backing up a PG data directory.  There are not, and
are never likely to be, any legitimate files with non-ASCII names
in that context.  Why can't we just skip any such files?

                        regards, tom lane


Reply via email to