Hi! Per Issue 8 Draft 2.1 (XCU, csplit, OPERANDS): 84808 Each arg operand can be one of the following:
84809 /rexp/[offset] 84810 A file shall be created using the content of the lines from the current line up to, but 84811 not including, the line that results from the evaluation of the regular expression 84812 with offset, if any, applied. The regular expression rexp shall follow the rules for 84813 basic regular expressions described in XBD Section 9.3 (on page 167). The 84814 application shall use the sequence "\/" to specify a <slash> character within the 84815 rexp. The optional offset shall be a positive or negative integer value representing a 84816 number of lines. A positive integer value can be preceded by '+'. If the selection 84817 of lines from an offset expression of this type would create a file with zero lines, or 84818 one with greater than the number of lines left in the input file, the results are 84819 unspecified. After the section is created, the current line shall be set to the line that 84820 results from the evaluation of the regular expression with any offset applied. If the 84821 current line is the first line in the file and a regular expression operation has not yet 84822 been performed, the pattern match of rexp shall be applied from the current line to 84823 the end of the file. Otherwise, the pattern match of rexp shall be applied from the 84824 line following the current line to the end of the file. 84825 %rexp%[offset] 84826 Equivalent to /rexp/[offset], except that no file shall be created for the selected 84827 section of the input file. The application shall use the sequence "\%" to specify a 84828 <percent-sign> character within the rexp. Per XBD 9.3.8: 6011 9.3.8 BRE Expression Anchoring 6012 A BRE can be limited to matching expressions that begin or end a string; this is called 6013 ``anchoring’’. The <circumflex> and <dollar-sign> special characters shall be considered BRE 6014 anchors in the following contexts: ... 6023 2. A <dollar-sign> ('$') shall be an anchor when used as the last character of an entire BRE. 6024 The implementation may treat a <dollar-sign> as an anchor when used as the last 6025 character of a subexpression. The <dollar-sign> shall anchor the expression (or optionally 6026 subexpression) to the end of the string being matched; the <dollar-sign> can be said to 6027 match the end-of-string following the last character. 6028 3. A BRE anchored by both '^' and '$' shall match only an entire string. For example, the 6029 BRE "^abcdef$" matches strings consisting only of "abcdef". So, naturally, for this: $ echo 10 | csplit - '/^10$/' XCU, csplit, OPERANDS again: 84836 An error shall be reported if an operand does not reference a line between the current position 84837 and the end of the file. Since a line is, as we all know, defined as (XBD): 1721 3.179 Line 1722 A sequence of zero or more non-<newline> characters plus a terminating <newline> character. And, naturally, the regex doesn't match the end of the line. Conversely, the following must work: $ echo 10 | csplit - '/^10 $/' Behold (I've substituted echo for seq because NetBSD csplit has issues when presented with just one line): netbsd-9.2$ seq 999 | csplit - '/^10$/' csplit: ^10$: no match netbsd-9.2$ seq 999 | csplit - '/^10 $/' 18 3870 netbsd-9.2$ wc -l * 9 xx00 990 xx01 990 total vs (coreutils 8.32-4+b1 and 9.1-1): debian$ seq 999 | csplit - '/^10$/' 18 3870 debian$ wc -l * 9 xx00 990 xx01 999 total debian$ seq 999 | csplit - '/^10 $/' csplit: ‘/^10\n$/’: match not found 3888 FreeBSD agrees with NetBSD. The illumos gate (5.11; tribblix m25) agrees with coreutils. From the user's stand-point, it doesn't seem to make sense to require specifying the newline literally, since the input is defined to consist of lines only /anyway/. Might it make sense to spec that the rexps are to be compiled with REG_NEWLINE, or the equivalent? Best, наб
signature.asc
Description: PGP signature