Greetings. I'm reporting a bug with csplit coreutils 5.2.1, compiled from sources on a SuSE 9.3 system. It seems this bug was previously reported over a year ago (see <http://lists.gnu.org/archive/html/bug-coreutils/2004-08/msg00112.html>) but it was never squashed.
In short, csplit produces corrupt output when the input file contains very long lines. An example file is at <http://www.dfki.uni-kl.de/~miller/tmp/wikipedia.xml>, an XML file containing three articles from Wikipedia. The second article was vandalized by a spammer who inserted a ridiculously long line (42280 characters) full of links. If I try to split this file with $ csplit wikipedia.xml '/<page>/' '{*}' then the file with the second article, xx02, is garbled at the beginning of the long line. See <http://www.dfki.uni-kl.de/~miller/tmp/xx02>. Regards, Tristan -- _ _V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited / |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard (7_\\ http://www.nothingisreal.com/ >< To finish what you
pgpppyH8NXPWm.pgp
Description: PGP signature
_______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils