Hi!

Per Issue 8 Draft 2.1 (XCU, csplit, OPERANDS):
  84808 Each arg operand can be one of the following:

  84809 /rexp/[offset]
  84810 A file shall be created using the content of the lines from the current 
line up to, but
  84811 not including, the line that results from the evaluation of the regular 
expression
  84812 with offset, if any, applied. The regular expression rexp shall follow 
the rules for
  84813 basic regular expressions described in XBD Section 9.3 (on page 167). 
The
  84814 application shall use the sequence "\/" to specify a <slash> character 
within the
  84815 rexp. The optional offset shall be a positive or negative integer value 
representing a
  84816 number of lines. A positive integer value can be preceded by '+'. If 
the selection
  84817 of lines from an offset expression of this type would create a file 
with zero lines, or
  84818 one with greater than the number of lines left in the input file, the 
results are
  84819 unspecified. After the section is created, the current line shall be 
set to the line that
  84820 results from the evaluation of the regular expression with any offset 
applied. If the
  84821 current line is the first line in the file and a regular expression 
operation has not yet
  84822 been performed, the pattern match of rexp shall be applied from the 
current line to
  84823 the end of the file. Otherwise, the pattern match of rexp shall be 
applied from the
  84824 line following the current line to the end of the file.

  84825 %rexp%[offset]
  84826 Equivalent to /rexp/[offset], except that no file shall be created for 
the selected
  84827 section of the input file. The application shall use the sequence "\%" 
to specify a
  84828 <percent-sign> character within the rexp.

Per XBD 9.3.8:
  6011  9.3.8 BRE Expression Anchoring

  6012  A BRE can be limited to matching expressions that begin or end a 
string; this is called
  6013  ``anchoring’’. The <circumflex> and <dollar-sign> special characters 
shall be considered BRE
  6014  anchors in the following contexts:
        ...
  6023  2. A <dollar-sign> ('$') shall be an anchor when used as the last 
character of an entire BRE.
  6024  The implementation may treat a <dollar-sign> as an anchor when used as 
the last
  6025  character of a subexpression. The <dollar-sign> shall anchor the 
expression (or optionally
  6026  subexpression) to the end of the string being matched; the 
<dollar-sign> can be said to
  6027  match the end-of-string following the last character.
  6028  3. A BRE anchored by both '^' and '$' shall match only an entire 
string. For example, the
  6029  BRE "^abcdef$" matches strings consisting only of "abcdef".

So, naturally, for this:
  $ echo 10 | csplit - '/^10$/'
XCU, csplit, OPERANDS again:
  84836 An error shall be reported if an operand does not reference a line 
between the current position
  84837 and the end of the file.

Since a line is, as we all know, defined as (XBD):
  1721  3.179 Line
  1722  A sequence of zero or more non-<newline> characters plus a terminating 
<newline> character.

And, naturally, the regex doesn't match the end of the line.
Conversely, the following must work:
  $ echo 10 | csplit - '/^10
  $/'

Behold (I've substituted echo for seq because NetBSD csplit has issues
when presented with just one line):
  netbsd-9.2$ seq 999 | csplit - '/^10$/'
  csplit: ^10$: no match
  netbsd-9.2$ seq 999 | csplit - '/^10
  $/'
  18
  3870
  netbsd-9.2$ wc -l *
         9 xx00
       990 xx01
       990 total
vs (coreutils 8.32-4+b1 and 9.1-1):
  debian$ seq 999 | csplit - '/^10$/'
  18
  3870
  debian$ wc -l *
     9 xx00
   990 xx01
   999 total
  debian$ seq 999 | csplit - '/^10
  $/'
  csplit: ‘/^10\n$/’: match not found
  3888

FreeBSD agrees with NetBSD.
The illumos gate (5.11; tribblix m25) agrees with coreutils.

From the user's stand-point, it doesn't seem to make sense to require
specifying the newline literally, since the input is defined to consist
of lines only /anyway/.

Might it make sense to spec that the rexps are to be compiled with
REG_NEWLINE, or the equivalent?

Best,
наб

Attachment: signature.asc
Description: PGP signature

  • csplit rexp-style operand need... наб via austin-group-l at The Open Group

Reply via email to