Re: tar vs device special files
>> It appears to be specifying pax's behaviour, not tar's. [...] > POSIX used to specify tar, long ago, but there were (as I understand > it) too many incompat variants, so it was dropped. Not entirely surprising. > You should have been expecting that as the link you were given ended > in pax.html#some-tag-or-other Yes, I noticed that...after the fact. I got the file, thank you! /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: tar vs device special files
Date:Sat, 28 Oct 2023 23:32:56 -0400 (EDT) From:Mouse Message-ID: <202310290332.xaa23...@stone.rodents-montreal.org> | It appears to be specifying pax's behaviour, not tar's. Is tar | specified to use the same format by reference, or is tar not specified | but everyone just implements it to use pax's ustar format, or what? POSIX used to specify tar, long ago, but there were (as I understand it) too many incompat variants, so it was dropped. There is no standard for tar (which makes it, as an interchange format, essentially useless). However, as I understand things, most modern tar implementations are in effect a variation on pax but only support pax's ustar format (and not the others that pax also supports). You should have been expecting that as the link you were given ended in pax.html#some-tag-or-other kre ps: if I managed to somehow spam the list with a copy of the POSIX pax spec (in PDF format), I apologies - I intended to send it just to mouse@ but didn't delete tech-userlevel ... I tried to kill it, but the network was faster than I am, I believe. Hopefully some list sanity checking will have dropped the message, or something (it has not returned here).
Re: tar vs device special files
>> So there _is_ a POSIX spec for tarchives? [...] > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06 I got that fetched and have been going through it. It appears to be specifying pax's behaviour, not tar's. Is tar specified to use the same format by reference, or is tar not specified but everyone just implements it to use pax's ustar format, or what? It also seems to me that significant fractions of it are unimplementable on NetBSD because they demand recoding to or from UTF-8 for things that NetBSD handles as octet strings, not character strings (which therefore cannot be recoded to or from UTF-8 even in principle), such as user names in the system user database (/etc/{master.,}passwd for NetBSD). Is there a canonical way of handling such things? passwd(5) on 9.0 and on 5.2 specify that /etc/passwd contains ASCII records, but 5.2 vipw does not complain when I put a 0xe5 octet in a record's username and homedir fields - and it appears to work just fine, so any such restriction is not enforced. This means software has to do _something_ with faced with such things. Is there some kind of system-wide locale setting used for non-user-specific things like usernames, or what? I'm moderately sure there isn't any such thing on 5.2 and earlier. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: tar vs device special files
>> So there _is_ a POSIX spec for tarchives? [...] > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06 I'll have to scare up a work machine to fetch that from, since apparently pubs.opengroup.org is not interested in serving content over HTTP. But that should be doable; work these days tends to inflict recent Linux on me, and, as unpleasant as I find that for most purposes, it does mean things like curl with HTTPS support. Thank you! /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: tar vs device special files
On Sunday, October 29, 2023 2:29:47 AM CET Mouse wrote: > > I don't think any one else cares about pre-ustar. Pretty much any > > reader and writer around uses at least ustar and generally wants to > > have extended POSIX as well when caring about large files. > > So there _is_ a POSIX spec for tarchives? Is the spec available, or is > this yet another pay-to-play "standard"? I've gone looking for specs > for tar before, but each time I have, I've been unable to find anything > that isn't behind a paywall of one sort or another (and thus a total > nonstarter for me). https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ pax.html#tag_20_92_13 Joerg
Re: tar vs device special files
> Date: Sat, 28 Oct 2023 21:29:47 -0400 (EDT) > From: Mouse > > So there _is_ a POSIX spec for tarchives? Is the spec available, or is > this yet another pay-to-play "standard"? I've gone looking for specs > for tar before, but each time I have, I've been unable to find anything > that isn't behind a paywall of one sort or another (and thus a total > nonstarter for me). > > Admittedly, I haven't looked recently. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06
Re: tar vs device special files
> I don't think any one else cares about pre-ustar. Pretty much any > reader and writer around uses at least ustar and generally wants to > have extended POSIX as well when caring about large files. So there _is_ a POSIX spec for tarchives? Is the spec available, or is this yet another pay-to-play "standard"? I've gone looking for specs for tar before, but each time I have, I've been unable to find anything that isn't behind a paywall of one sort or another (and thus a total nonstarter for me). Admittedly, I haven't looked recently. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: tar vs device special files
>> (It doesn't help that I haven't managed to find a clear spec for tar >> format; the closest I've found so far is a description of what pax, >> in its (supposedly-)tar-compatible mode, is supposed to read/write.) > All of this can be found in: > src/external/bsd/libarchive/dist/libarchive/archive_read_support_format_tar.c Thank you! I'll have a look. > If the libarchive tar doesn't see a "ustar \0" (GNU tar) or "ustar" > (POSIX tar) magic at 0x101 (see: tar_read_header()), it take the file > to be a non-POSIX old-style tar archive which (according to > libarchive) doesn't store maj./min. nos. (see: struct > archive_entry_header_ustar) That is ... a significant deviation from historical practice, to the extent that I would call it a bug in libarchive's tar support. (I don't think I've ever stumbled across any other tar that didn't understand mtar's archives, though admittedly I don't pass archives including device special files between implementations very often, so if the incompatibility is limited to them I might well not notice.) > Maybe your tar could supply a "ustar" magic char. seq. at 0x101 for > libarchive. (see: header_ustar() vs. header_old_tar()) I'll read the file you pointed at (though the path makes it sound like a description of what libarchive chooses to do rather than anything authoritative, though admittedly I don't know whether there _is_ anything authoritative when it comes to tar in general, as opposed to specific tar implementations). > Or, fix libarchive like this: [...] If this isn't just a NetBSD oddity, I'd prefer to generate archives that are more widely compatible. Maybe even if it is. Either way, fixing libarchive is counterindicated (unless NetBSD is willing to take up the changes, which strikes me as unlikely). /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: tar vs device special files
On Sunday, October 29, 2023 12:40:06 AM CEST RVP wrote: > On Sat, 28 Oct 2023, Mouse wrote: > > I'm having trouble seeing what's responsible, and in particular am > > wondering whether this is my bug or /bin/tar's bug or what. (It > > doesn't help that I haven't managed to find a clear spec for tar > > format; the closest I've found so far is a description of what pax, in > > its (supposedly-)tar-compatible mode, is supposed to read/write.) > > All of this can be found in: > > src/external/bsd/libarchive/dist/libarchive/ archive_read_support_format_tar.c There is even a man page going over many of the variants and the details. > Maybe your tar could supply a "ustar" magic char. seq. at 0x101 for > libarchive. (see: header_ustar() vs. header_old_tar()) I don't think any one else cares about pre-ustar. Pretty much any reader and writer around uses at least ustar and generally wants to have extended POSIX as well when caring about large files. I see no reasons for adding random hacks for outdated tar programs with little real world exposure, changes are high it is going to break something with other archives. Joerg
Re: tar vs device special files
On Sat, 28 Oct 2023, Mouse wrote: I'm having trouble seeing what's responsible, and in particular am wondering whether this is my bug or /bin/tar's bug or what. (It doesn't help that I haven't managed to find a clear spec for tar format; the closest I've found so far is a description of what pax, in its (supposedly-)tar-compatible mode, is supposed to read/write.) All of this can be found in: src/external/bsd/libarchive/dist/libarchive/archive_read_support_format_tar.c If the libarchive tar doesn't see a "ustar \0" (GNU tar) or "ustar" (POSIX tar) magic at 0x101 (see: tar_read_header()), it take the file to be a non-POSIX old-style tar archive which (according to libarchive) doesn't store maj./min. nos. (see: struct archive_entry_header_ustar) The 9.1 /bin/tar tarball (hexdump -C) is 00a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || * 0100 00 75 73 74 61 72 00 30 30 72 6f 6f 74 00 00 00 |.ustar.00root...| 0110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || 0120 00 00 00 00 00 00 00 00 00 6f 70 65 72 61 74 6f |.operato| 0130 72 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |r...| 0140 00 00 00 00 00 00 00 00 00 30 30 30 30 30 33 20 |.03 | 0150 00 30 30 30 30 30 33 20 00 00 00 00 00 00 00 00 |.03 | 0160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || whereas mine is 00a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || * 0140 00 00 00 00 00 00 00 00 00 30 30 30 30 30 33 20 |.03 | 0150 00 30 30 30 30 30 33 20 00 00 00 00 00 00 00 00 |.03 | 0160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || Except for the stuff at offsets 0x100-0x131, they look pretty close to identical to me (the value at 0x94 is the header checksum), and that stuff is, as far as I can tell, owner name strings (which I'm not supplying, just using the numeric uid and gid values). But the stock 9.1 tar seems to be taking the 03 major and minor numbers as zero for reasons I don't understand, since it understands its own, apparently identical, major and minor numbers just fine. Any ideas? Maybe your tar could supply a "ustar" magic char. seq. at 0x101 for libarchive. (see: header_ustar() vs. header_old_tar()) Or, fix libarchive like this: ``` diff -urN a/src/external/bsd/libarchive/dist/libarchive/archive_read_support_format_tar.c b/src/external/bsd/libarchive/dist/libarchive/archive_read_support_format_tar.c --- a/src/external/bsd/libarchive/dist/libarchive/archive_read_support_format_tar.c 2019-07-24 13:50:23.0 + +++ b/src/external/bsd/libarchive/dist/libarchive/archive_read_support_format_tar.c 2023-10-28 22:10:28.778721000 + @@ -1383,6 +1383,14 @@ if (err > err2) err = err2; + /* Parse out device numbers only for char and block specials. */ + if (header->typeflag[0] == '3' || header->typeflag[0] == '4') { + archive_entry_set_rdevmajor(entry, (dev_t) + tar_atol(header->rdevmajor, sizeof(header->rdevmajor))); + archive_entry_set_rdevminor(entry, (dev_t) + tar_atol(header->rdevminor, sizeof(header->rdevminor))); + } + tar->entry_padding = 0x1ff & (-tar->entry_bytes_remaining); return (err); } ``` -RVP