Package: gawk Version: 1:5.2.1-2 1:5.1.0-1 Hi,
I initially ran into this issue on Debian 11 Bullseye, but I can also reproduce it in Debian Unstable as of now: We do have logs which separate fields with "||", i.e. two pipe characters. (Yeah, likely not ideal, but that's given. :-) With mawk I can parse them easily: $ echo 'a||b' | mawk -F'\|\|' '{print $1"X"$2}' aXb (backslashes because multicharacter $FS is considered to be a regular expression and hence the special character pipe needs to be escaped. mawk also argues otherwise — IMHO correctly.) gawk though behaves strange and especially inconsistently: $ echo 'a||b' | gawk -F'\|\|' '{print $1"X"$2}' gawk: warning: escape sequence `\|' treated as plain `|' a||bX Ok, so '\|' should be written as just '|'? Unexpected, but ok. Let's do that: $ echo 'a||b' | gawk -F'||' '{print $1"X"$2}' a||bX No more argues, but the output is as wrong as before. It's also not that it treated the pipe as regular expression (in which case it would probably match any empty string twice and should probably output something like "a|"). I though would have kinda expected that "||" is considered to be a regular expression and hence would require the backslash. Using e.g. $ echo 'a||b' | gawk 'FS="\|\|" {print $1"X"$2}' gawk: cmd. line:1: warning: escape sequence `\|' treated as plain `|' a||bX seems to make no difference. What does work as expected with gawk (and mawk) is though this: $ echo 'a||b' | gawk -F'[|][|]' '{print $1"X"$2}' aXb Interestingly, if only a single pipe character is used as delimited it works as expected again: $ echo 'a|b' | gawk -F'\|' '{print $1"X"$2}' gawk: warning: escape sequence `\|' treated as plain `|' aXb $ echo 'a|b' | gawk -F'|' '{print $1"X"$2}' aXb So the bug seems to only appear if at least two pipes are used as delimiter. (It behaves the same way with three pipes as with two pipes.) Part of the bug or a separate bug might be that it argues even in the two character version (hence expected to be a regexp) about "\|" being interpreted as plain "|" which from my point of view is only correct in the one-letter (plus espaping) variant '\|', but not for '\|\|'. Counter examples: $ echo 'afbgc' | awk -F 'f|g' '{print $1, $2, $3}' a b c $ echo 'afbgc' | awk -F 'f\|g' '{print $1, $2, $3}' awk: warning: escape sequence `\|' treated as plain `|' a b c $ echo 'af|gc' | awk -F 'f\|g' '{print $1, $2}' awk: warning: escape sequence `\|' treated as plain `|' a | In the last example it IMHO should not have replaced the "\|" with just a "|" which is also not "plain" but a special character which was meant to be escaped. The wanted output was "a c". -- System Information: Debian Release: trixie/sid APT prefers unstable APT policy: (990, 'unstable'), (600, 'testing'), (500, 'unstable-debug'), (500, 'buildd-unstable'), (110, 'experimental'), (1, 'experimental-debug'), (1, 'buildd-experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 6.5.0-4-amd64 (SMP w/8 CPU threads; PREEMPT) Kernel taint flags: TAINT_WARN Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /bin/dash Init: sysvinit (via /sbin/init) LSM: AppArmor: enabled Versions of packages gawk depends on: ii libc6 2.37-13 ii libgmp10 2:6.3.0+dfsg-2 ii libmpfr6 4.2.1-1 ii libreadline8 8.2-3 ii libsigsegv2 2.14-1 gawk recommends no packages. Versions of packages gawk suggests: pn gawk-doc <none> -- no debconf information