Good day,

I am experiencing an odd behaviour in csplit which may actually be a
bug.

I am testing this against the code cloned from
https://github.com/coreutils/coreutils.git, on the commit described by
git as v8.32-52-gc0e5f8c59.

Suppose I have the following YAML file:

==> test.yaml <==
value1: 123
---
value2: 456
---
value3: 789

and I want to split it at '---' lines. First I would try the following:

    csplit -z --suppress-matched test.yaml '/^---$/' '{1}'

which outputs:

    12
    12
    16

and creates the following files:

    ==> xx00 <==
    value1: 123

    ==> xx01 <==
    value2: 456

    ==> xx02 <==
    ---
    value3: 789

The last portion still contains the '---', despite it being suppressed
from the second part.

Now, if I try again with:

    csplit -z --suppress-matched test.yaml '/^---$/' '{*}'

I get:

    12
    12
    12

and:

    ==> xx00 <==
    value1: 123

    ==> xx01 <==
    value2: 456

    ==> xx02 <==
    value3: 789

where the last part does not contain the matched line, as expected.

While trying to figure out the problem, I noticed that match suppression
is done at the beginning of process_regexp. For a match-twice scenario
like the first one, the function is called twice, then the rest of the
file is simply dumped by split_file.

This means that the two calls to process_regexp will:

* suppress nothing for call #1 because nothing has been matched yet;
* suppress the first match in call #2.

Then, the rest of the file is dumped but no one actually suppressed the
second match, which appears in the last segment. When using asterisk
repetition, the file is instead dumped by process_regexp, which gets its
chance to suppress the matched line.

I came up with the attached patch, which simply moves match suppression
at the end of process_regexp. With this modification, the invocation:

    csplit -z --suppress-matched test.yaml '/^---$/' '{1}'

now produces:

    12
    12
    12

and:

==> xx00 <==
value1: 123

==> xx01 <==
value2: 456

==> xx02 <==
value3: 789

which is what I would expect.

diff --git a/src/csplit.c b/src/csplit.c
index 9bd9c43b5..93ff60dc6 100644
--- a/src/csplit.c
+++ b/src/csplit.c
@@ -803,9 +803,6 @@ process_regexp (struct control *p, uintmax_t repetition)
   if (!ignore)
     create_output_file ();
 
-  if (suppress_matched && current_line > 0)
-    remove_line ();
-
   /* If there is no offset for the regular expression, or
      it is positive, then it is not necessary to buffer the lines. */
 
@@ -893,6 +890,9 @@ process_regexp (struct control *p, uintmax_t repetition)
 
   if (p->offset > 0)
     current_line = break_line;
+
+  if (suppress_matched)
+    remove_line ();
 }
 
 /* Split the input file according to the control records we have built. */
  • bug#42764: csplit does n... Emanuele Giacomelli via GNU coreutils Bug Reports

Reply via email to