For reference, and I don't know if this is the same bug or just related,
but here's the original bug I ran into:

-- >8 --
$ rm xx*; seq 999 | csplit - /10/-5 30 /10/-5 {2} %10% 110
8
70
195
3
3
28
3560
$ head -n99999 xx*
==> xx00 <==
1
2
3
4

==> xx01 <==
5
6
7
8
9
10
...
29

==> xx02 <==
30
...
94

==> xx03 <==
95

==> xx04 <==
96

==> xx05 <==
103
104
105
106
107
108
109

==> xx06 <==
110
...
999
-- >8 --

Where'd 100..102 gone?

Compare s:%:/:g:
-- >8 --
$ rm xx*; seq 999 | csplit - /10/-5 30 /10/-5 {2} /10/ 110
8
70
195
3
3
21
28
3560
$ head -n99999 xx*
==> xx00 <==
1
2
3
4

==> xx01 <==
5
6
7
8
9
10
...
29

==> xx02 <==
30
...
94

==> xx03 <==
95

==> xx04 <==
96

==> xx05 <==
97
98
99
100
101
102

==> xx06 <==
103
104
105
106
107
108
109

==> xx07 <==
110
...
999
-- >8 --

And compare my diagram:
       /10/-5  30     /10/-5 {2}   %10%     110      expr
                      0  1  2                        rep
  1-4  5-29    30-94  95 96 97-99  100-109  110-999  line
  00   01      02     03 04 05     06       07       file
When the "10" regex is %-wrapped, file 05 is not allocated, as expected.

POSIX leaves what happens when applying an expression would leave a
zero-sized file unspecified, which is why it's legal for coreutils
csplit to always eject a line; for comparison, NetBSD &a. csplit
creates 2 empty files for the /10/-5 {2} expression, for obvious reasons.

Consider therefore the same diagram but vertical
(annotation signifying file start):
-- >8 --
1    xx00
2
3
4
5    xx01
6
7
8
9
10
...
29
30   xx02
...
94
95    xx03
96    xx04
97    xx05
98
99
100   xx06
101
102
103
104
105
106
107
108
109
110   xx07
...
999
-- >8 --

Since in the csplit language, for a constant input, all expressions sans
%expr% can be reduced to line number expressions¹,
here's the equivalent invocation:
-- >8 --
$ rm xx*; seq 999 | csplit - 5 30 95 96 97 %10% 110
8
70
195
3
3
40
3560
$ head -n99999 xx*
==> xx00 <==
1
2
3
4

==> xx01 <==
5
6
7
8
9
10
...
29

==> xx02 <==
30
...
94

==> xx03 <==
95

==> xx04 <==
96

==> xx05 <==
100
101
102
103
104
105
106
107
108
109

==> xx06 <==
110
...
999
-- >8 --

So, again, this appears to be lookbehind rearing its filthy head again.

Best,
наб

¹ I don't think this is strictly true for the strict POSIX dialect
  without a forward-progress hatch, like NetBSD, but it is true for the
  coreutils dialect where a regex expression always ejects a line.
  Whatever, you get the point.

Attachment: signature.asc
Description: PGP signature

Reply via email to