On Fri, Sep 07, 2007 at 08:26:50AM -0700, tabris wrote: > Richard Lyons wrote: > > On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote: > >> Richard Lyons wrote: [...] > >>> I need a script to read a text file (actually tex) and parse lines of a > >>> table that may or may not span newline characters in the file. > >>> Basically, there are lines of the form > >>> > >>> {some text} & {some more text} & {text c} & {text d} \\ > >>> > >>> where the braces are only for clarity and do not occur in the files, and > >>> where the bits of text may include whitespace which may include newline > >>> characters. There may also be escaped ampersands in the text ('\&'), and > >>> the text fragments may be empty. > >>> > >>> I suspect perl may be the way forward. I need to be able to read each > >>> file, parse each set of three ampersands with a double backslash > >>> breaking it into four substrings, manipulate the substrings and write > >>> the file anew. A typical manipulation will be to take text c and copy > >>> it to text d. I shall also try to strip leading and trailing whitespace > >>> to tidy up the file. > >>> > >>> > >> please give real examples the text you have, as well as more info about > >> what processing you will do with it. [...] > > > > \mbox{Walls} &Plain plastered and painted white. &GC but to soiled > > around switch, RHS as entering, HL marks. OW nail near centre, some > > blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH > > corner, white painted, cracks at junctions. & \\ > > > > and here is another: > > > > &catch, diecast \& epoxy coated with security lock & GC &\\
> > > well, I'd say something along these lines assuming that you have $l > populated with the entire piece you want. > Also note that this attempts to avoid use of regexps where possible, as > they tend to be slow and hard to read. Not that I dislike regexps, but I > don't think they're necessary here. Also note that none of this code has > been tested, it's the product of about 5 minutes of hacking. > > my @phrases = split('&', $l); > { > my @tmp; > while(my $phrase = shift @phrases) { > if (substr($phrase, -2) eq '\') { > my $tmp = $phrase .'&'. (shift @phrases); > } > push @tmp, $phrase; > } > @phrases = @tmp; > } > > # remove trailing or leading whitespace > foreach my $phrase (@phrases) { > $phrase =~ s/^\s//; #remove leading spaces > $phrase =~ s/\s$//; # remove trailing spaces > $phrase =~ s/\n/ /g; # change all new-line chars to spaces > } > > # now reconstruct your text however you want it. > # I have a good (free, public-domain) line splitter if you need one. I would like that. The script fragments were a great help, thanks. Main bug was $tmp is unnecessary and should just be $phrase. I'd post the whole script (well, two actually -- I used a bash script to set things up and just called the perl to do the dirty work), but it is such a specific use that it would be of no use to anyone else. Pity really after spending so many hours on it. Still, at least I learned some perl. I should recommend anyone else looking for a beginners' introduction to perl to http://www.perltraining.com.au/notes.html as well as http://mailman.linuxchix.org/pipermail/courses/2003-September/001344.html this second URL is lesson 10, which I have given because the earlier lessons do not index forward. There is, of course a mass of other stuff, not least http://cpan.org and all the perldoc info. So thanks, case closed. -- richard > -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]