
I'm reporting a bug with csplit coreutils 5.2.1, compiled from sources on a 
SuSE 9.3 system.  It seems this bug was previously reported over a year 
ago (see 
but it was never squashed.

In short, csplit produces corrupt output when the input file contains very 
long lines.  An example file is at 
<>, an XML file 
containing three articles from Wikipedia.  The second article was 
vandalized by a spammer who inserted a ridiculously long line (42280 
characters) full of links.

If I try to split this file with

$ csplit wikipedia.xml '/<page>/' '{*}'

then the file with the second article, xx02, is garbled at the beginning of 
the long line.  See <>.


  _V.-o  Tristan Miller [en,(fr,de,ia)]  ><  Space is limited
 / |`-'  -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  <>  In a haiku, so it's hard
(7_\\   ><  To finish what you

Attachment: pgpppyH8NXPWm.pgp
Description: PGP signature

Bug-coreutils mailing list

Reply via email to