Package: coreutils Version: 8.32-4 Severity: normal File: /usr/bin/cut Dear Maintainer,
POSIX.1-2008 says: -- >8 -- -n Do not split characters. When specified with the -b option, each element in list of the form low-high (<hyphen-minus>-separated numbers) shall be modified as follows: * If the byte selected by low is not the first byte of a character, low shall be decremented to select the first byte of the character originally selected by low. If the byte selected by high is not the last byte of a character, high shall be decremented to select the last byte of the character prior to the character originally selected by high, or zero if there is no prior character. If the resulting range element has high equal to zero or low greater than high, the list element shall be dropped from list for that input line without causing an error. Each element in list of the form low- shall be treated as above with high set to the number of bytes in the current line, not including the terminating <newline>. Each element in list of the form -high shall be treated as above with low set to 1. Each element in list of the form num (a single number) shall be treated as above with low set to num and high set to num. -- >8 -- With a more succinct exemplary text driving the point home: -- >8 -- Earlier versions of the cut utility worked in an environment where bytes and characters were considered equivalent (modulo <backspace> and <tab> processing in some implementations). In the extended world of multi-byte characters, the new -b option has been added. The -n option (used with -b) allows it to be used to act on bytes rounded to character boundaries. The algorithm specified for -n guarantees that: cut -b 1-500 -n file > file1 cut -b 501- -n file > file2 ends up with all the characters in file appearing exactly once in file1 or file2. (There is, however, a <newline> in both file1 and file2 for each <newline> in file.) -- >8 -- So, compare a conforming implementation: -- >8 -- $ printf 'яйцо\nЯЙЦО' | ./out/cmd/cut -nb 1-5 яй ЯЙ $ printf 'яйцо\nЯЙЦО' | ./out/cmd/cut -nb 6- цо ЦО $ printf 'яйцо\nЯЙЦО' | ./out/cmd/cut -nb 1-4 яй ЯЙ $ printf 'яйцо\nЯЙЦО' | ./out/cmd/cut -nb 5- цо ЦО $ printf 'яйцо\nЯЙЦО' | ./out/cmd/cut -nb 1-3 я Я $ printf 'яйцо\nЯЙЦО' | ./out/cmd/cut -nb 4- йцо ЙЦО -- >8 -- With the garbage that GNU cut spews: -- >8 -- $ printf 'яйцо\nЯЙЦО' | cut -nb 1-5 яй� ЯЙ� $ printf 'яйцо\nЯЙЦО' | cut -nb 6- �о �О $ printf 'яйцо\nЯЙЦО' | cut -nb 1-4 яй ЯЙ $ printf 'яйцо\nЯЙЦО' | cut -nb 5- цо ЦО $ printf 'яйцо\nЯЙЦО' | cut -nb 1-3 я� Я� $ printf 'яйцо\nЯЙЦО' | cut -nb 4- �цо �ЦО -- >8 -- Or, without the luxury of REPLACEMENT CHARACTER: -- >8 -- $ printf 'яйцо\nЯЙЦО' | cut -nb 1-5 | hexdump -C 00000000 d1 8f d0 b9 d1 0a d0 af d0 99 d0 0a |............| 0000000c $ printf 'яйцо\nЯЙЦО' | cut -nb 6- | hexdump -C 00000000 86 d0 be 0a a6 d0 9e 0a |........| 00000008 $ printf 'яйцо\nЯЙЦО' | cut -nb 1-4 | hexdump -C 00000000 d1 8f d0 b9 0a d0 af d0 99 0a |..........| 0000000a $ printf 'яйцо\nЯЙЦО' | cut -nb 5- | hexdump -C 00000000 d1 86 d0 be 0a d0 a6 d0 9e 0a |..........| 0000000a $ printf 'яйцо\nЯЙЦО' | cut -nb 1-3 | hexdump -C 00000000 d1 8f d0 0a d0 af d0 0a |........| 00000008 $ printf 'яйцо\nЯЙЦО' | cut -nb 4- | hexdump -C 00000000 b9 d1 86 d0 be 0a 99 d0 a6 d0 9e 0a |............| 0000000c -- >8 -- If we consult the manual, we can see: -- >8 -- $ man cut | grep -C3 -- -n select only these fields; also print any line that contains no delimiter character, unless the -s op‐ tion is specified -n (ignored) --complement complement the set of selected bytes, characters or fields -- >8 -- If I hadn't seen the dog-water I was given I would've assumed this a joke; a bad one. But I have, and I don't think I can classify this as anything but "actively malicious". Either don't recognise -n at all or implement it. Don't destroy the input while actively flaunting defying the standard. наб -- System Information: Debian Release: 11.0 APT prefers unstable APT policy: (500, 'unstable') Architecture: x32 (x86_64) Foreign Architectures: amd64, i386 Kernel: Linux 5.10.0-8-amd64 (SMP w/2 CPU threads) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages coreutils depends on: ii libacl1 2.2.53-10 ii libattr1 1:2.4.48-6 ii libc6 2.31-16 ii libgmp10 2:6.2.1+dfsg-1 ii libselinux1 3.1-3 coreutils recommends no packages. coreutils suggests no packages. -- no debconf information
signature.asc
Description: PGP signature