Greetings.

I'm reporting a bug with csplit coreutils 5.2.1, compiled from sources on a 
SuSE 9.3 system.  It seems this bug was previously reported over a year 
ago (see 
<http://lists.gnu.org/archive/html/bug-coreutils/2004-08/msg00112.html>) 
but it was never squashed.

In short, csplit produces corrupt output when the input file contains very 
long lines.  An example file is at 
<http://www.dfki.uni-kl.de/~miller/tmp/wikipedia.xml>, an XML file 
containing three articles from Wikipedia.  The second article was 
vandalized by a spammer who inserted a ridiculously long line (42280 
characters) full of links.

If I try to split this file with

$ csplit wikipedia.xml '/<page>/' '{*}'

then the file with the second article, xx02, is garbled at the beginning of 
the long line.  See <http://www.dfki.uni-kl.de/~miller/tmp/xx02>.

Regards,
Tristan

-- 
   _
  _V.-o  Tristan Miller [en,(fr,de,ia)]  ><  Space is limited
 / |`-'  -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  <>  In a haiku, so it's hard
(7_\\    http://www.nothingisreal.com/   ><  To finish what you

Attachment: pgpppyH8NXPWm.pgp
Description: PGP signature

_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Reply via email to