Re: another script query (perl?) -- OT
On Sat, Sep 15, 2007 at 09:22:45PM +1200, Chris Bannister wrote: > On Fri, Sep 07, 2007 at 03:04:50PM +0100, Richard Lyons wrote: > > Hi, all you script wizards. > > > > I thought this would be easy, but I haven't found anything to crib > > from... > > > > I need a script to read a text file (actually tex) and parse lines of a > > table that may or may not span newline characters in the file. > > Basically, there are lines of the form > > > >{some text} & {some more text} & {text c} & {text d} \\ > > > Wrong list, for this sort of question. This list is _supposed_ to be for > Debian specific usage questions. Yes, but these guys are _good_. Problem was solved about a week back thanks to them. Have a nice day! -- richard -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: another script query (perl?)
On Fri, Sep 07, 2007 at 03:04:50PM +0100, Richard Lyons wrote: > Hi, all you script wizards. > > I thought this would be easy, but I haven't found anything to crib > from... > > I need a script to read a text file (actually tex) and parse lines of a > table that may or may not span newline characters in the file. > Basically, there are lines of the form > >{some text} & {some more text} & {text c} & {text d} \\ Wrong list, for this sort of question. This list is _supposed_ to be for Debian specific usage questions. -- Chris. == -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: another script query (perl?)
On Mon, Sep 10, 2007 at 08:30:42PM -0400, Celejar wrote: > On Fri, 7 Sep 2007 15:04:50 +0100 > Richard Lyons <[EMAIL PROTECTED]> wrote: > [...] > > > > I need a script to read a text file (actually tex) and parse lines of a > > table that may or may not span newline characters in the file. > > Basically, there are lines of the form > > > >{some text} & {some more text} & {text c} & {text d} \\ [...] > > Take a look at the perl Text::ParseWords module 'man > text::parsewords'). It may do what you want, depending on your needs > with respect to quoting and escaping. Why yes, that is interesting, albeit it came just too late for me. But there is sure to be a next time! -- richard -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
SOLVED: another script query (perl?)
On Fri, Sep 07, 2007 at 08:26:50AM -0700, tabris wrote: > Richard Lyons wrote: > > On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote: > >> Richard Lyons wrote: [...] > >>> I need a script to read a text file (actually tex) and parse lines of a > >>> table that may or may not span newline characters in the file. > >>> Basically, there are lines of the form > >>> > >>>{some text} & {some more text} & {text c} & {text d} \\ > >>> > >>> where the braces are only for clarity and do not occur in the files, and > >>> where the bits of text may include whitespace which may include newline > >>> characters. There may also be escaped ampersands in the text ('\&'), and > >>> the text fragments may be empty. > >>> > >>> I suspect perl may be the way forward. I need to be able to read each > >>> file, parse each set of three ampersands with a double backslash > >>> breaking it into four substrings, manipulate the substrings and write > >>> the file anew. A typical manipulation will be to take text c and copy > >>> it to text d. I shall also try to strip leading and trailing whitespace > >>> to tidy up the file. > >>> > >>> > >> please give real examples the text you have, as well as more info about > >> what processing you will do with it. [...] > > > > \mbox{Walls} &Plain plastered and painted white. &GC but to soiled > > around switch, RHS as entering, HL marks. OW nail near centre, some > > blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH > >corner, white painted, cracks at junctions. & \\ > > > > and here is another: > > > >&catch, diecast \& epoxy coated with security lock & GC &\\ > > > well, I'd say something along these lines assuming that you have $l > populated with the entire piece you want. > Also note that this attempts to avoid use of regexps where possible, as > they tend to be slow and hard to read. Not that I dislike regexps, but I > don't think they're necessary here. Also note that none of this code has > been tested, it's the product of about 5 minutes of hacking. > > my @phrases = split('&', $l); > { > my @tmp; > while(my $phrase = shift @phrases) { > if (substr($phrase, -2) eq '\') { >my $tmp = $phrase .'&'. (shift @phrases); > } > push @tmp, $phrase; > } > @phrases = @tmp; > } > > # remove trailing or leading whitespace > foreach my $phrase (@phrases) { > $phrase =~ s/^\s//; #remove leading spaces > $phrase =~ s/\s$//; # remove trailing spaces > $phrase =~ s/\n/ /g; # change all new-line chars to spaces > } > > # now reconstruct your text however you want it. > # I have a good (free, public-domain) line splitter if you need one. I would like that. The script fragments were a great help, thanks. Main bug was $tmp is unnecessary and should just be $phrase. I'd post the whole script (well, two actually -- I used a bash script to set things up and just called the perl to do the dirty work), but it is such a specific use that it would be of no use to anyone else. Pity really after spending so many hours on it. Still, at least I learned some perl. I should recommend anyone else looking for a beginners' introduction to perl to http://www.perltraining.com.au/notes.html as well as http://mailman.linuxchix.org/pipermail/courses/2003-September/001344.html this second URL is lesson 10, which I have given because the earlier lessons do not index forward. There is, of course a mass of other stuff, not least http://cpan.org and all the perldoc info. So thanks, case closed. -- richard > -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: another script query (perl?)
On Fri, 7 Sep 2007 15:04:50 +0100 Richard Lyons <[EMAIL PROTECTED]> wrote: > Hi, all you script wizards. > > I thought this would be easy, but I haven't found anything to crib > from... > > I need a script to read a text file (actually tex) and parse lines of a > table that may or may not span newline characters in the file. > Basically, there are lines of the form > >{some text} & {some more text} & {text c} & {text d} \\ > > where the braces are only for clarity and do not occur in the files, and > where the bits of text may include whitespace which may include newline > characters. There may also be escaped ampersands in the text ('\&'), and > the text fragments may be empty. > > I suspect perl may be the way forward. I need to be able to read each > file, parse each set of three ampersands with a double backslash > breaking it into four substrings, manipulate the substrings and write > the file anew. A typical manipulation will be to take text c and copy > it to text d. I shall also try to strip leading and trailing whitespace > to tidy up the file. > > Any and all pointers will be gratefully received! Take a look at the perl Text::ParseWords module 'man text::parsewords'). It may do what you want, depending on your needs with respect to quoting and escaping. > richard Celejar -- mailmin.sourceforge.net - remote access via secure (OpenPGP) email ssuds.sourceforge.net - A Simple Sudoku Solver and Generator -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: another script query (perl?)
On Fri, Sep 07, 2007 at 08:26:50AM -0700, tabris wrote: > Richard Lyons wrote: > > On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote: > > > > > >> Richard Lyons wrote: [...] > >>>{some text} & {some more text} & {text c} & {text d} \\ > >>> > >>> where the braces are only for clarity and do not occur in the files, and > >>> where the bits of text may include whitespace which may include newline > >>> characters. There may also be escaped ampersands in the text ('\&'), and > >>> the text fragments may be empty. > >>> > >>> I suspect perl may be the way forward. I need to be able to read each > >>> file, parse each set of three ampersands with a double backslash > >>> breaking it into four substrings, manipulate the substrings and write > >>> the file anew. A typical manipulation will be to take text c and copy > >>> it to text d. I shall also try to strip leading and trailing whitespace > >>> to tidy up the file. [...] > > > well, I'd say something along these lines assuming that you have $l > populated with the entire piece you want. > Also note that this attempts to avoid use of regexps where possible, as > they tend to be slow and hard to read. Not that I dislike regexps, but I > don't think they're necessary here. Also note that none of this code has > been tested, it's the product of about 5 minutes of hacking. > > my @phrases = split('&', $l); > { > my @tmp; > while(my $phrase = shift @phrases) { > if (substr($phrase, -2) eq '\') { >my $tmp = $phrase .'&'. (shift @phrases); > } > push @tmp, $phrase; > } > @phrases = @tmp; > } > > # remove trailing or leading whitespace > foreach my $phrase (@phrases) { > $phrase =~ s/^\s//; #remove leading spaces > $phrase =~ s/\s$//; # remove trailing spaces > $phrase =~ s/\n/ /g; # change all new-line chars to spaces > } > > # now reconstruct your text however you want it. > # I have a good (free, public-domain) line splitter if you need one. > Thanks for that. I shall have a serious look at it on Sunday. I must admit I had expected the solution to be an inscrutable regexp, so this is cool. -- richard -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: another script query (perl?)
Richard Lyons wrote: > On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote: > > >> Richard Lyons wrote: >> >>> Hi, all you script wizards. >>> >>> I thought this would be easy, but I haven't found anything to crib >>> from... >>> >>> I need a script to read a text file (actually tex) and parse lines of a >>> table that may or may not span newline characters in the file. >>> Basically, there are lines of the form >>> >>>{some text} & {some more text} & {text c} & {text d} \\ >>> >>> where the braces are only for clarity and do not occur in the files, and >>> where the bits of text may include whitespace which may include newline >>> characters. There may also be escaped ampersands in the text ('\&'), and >>> the text fragments may be empty. >>> >>> I suspect perl may be the way forward. I need to be able to read each >>> file, parse each set of three ampersands with a double backslash >>> breaking it into four substrings, manipulate the substrings and write >>> the file anew. A typical manipulation will be to take text c and copy >>> it to text d. I shall also try to strip leading and trailing whitespace >>> to tidy up the file. >>> >>> Any and all pointers will be gratefully received! >>> >>> >>> >> please give real examples the text you have, as well as more info about >> what processing you will do with it. >> There are multiple ways to approach this, we need to have more >> information first. >> >> > I'm not sure it helps a lot, as they vary quite lot, but here is one: > > \mbox{Walls} &Plain plastered and painted white. &GC but to soiled > around switch, RHS as entering, HL marks. OW nail near centre, some > blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH >corner, white painted, cracks at junctions. & \\ > > and here is another: > >&catch, diecast \& epoxy coated with security lock & GC &\\ > > If it is unclear to any non-latex-user, the ampersands are table column > separators in latex. > > After the manipulation I gave as an example, (copu text c to text d), I > would hope they would look like this: > > \mbox{Walls} & Plain plastered and painted white. & GC but to soiled > around switch, RHS as entering, HL marks. OW nail near centre, some > blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH > corner, white painted, cracks at junctions. & GC but to soiled around > switch, RHS as entering, HL marks. OW nail near centre, some blue-tac > remnants. LHW hairline cracking at HL. pipe boxing far RH corner, white > painted, cracks at junctions. \\ > > and: > > & catch, diecast \& epoxy coated with security lock & GC & GC \\ > > The first example shows the problem of included newlines, which might > occur as here or anywhere else in the text. Note that the whole text > fragment has been copied to the previously void fourth field. > > The second example shows the need not to be confused by '\&'. > > If that is any clearer... > > well, I'd say something along these lines assuming that you have $l populated with the entire piece you want. Also note that this attempts to avoid use of regexps where possible, as they tend to be slow and hard to read. Not that I dislike regexps, but I don't think they're necessary here. Also note that none of this code has been tested, it's the product of about 5 minutes of hacking. my @phrases = split('&', $l); { my @tmp; while(my $phrase = shift @phrases) { if (substr($phrase, -2) eq '\') { my $tmp = $phrase .'&'. (shift @phrases); } push @tmp, $phrase; } @phrases = @tmp; } # remove trailing or leading whitespace foreach my $phrase (@phrases) { $phrase =~ s/^\s//; #remove leading spaces $phrase =~ s/\s$//; # remove trailing spaces $phrase =~ s/\n/ /g; # change all new-line chars to spaces } # now reconstruct your text however you want it. # I have a good (free, public-domain) line splitter if you need one. signature.asc Description: OpenPGP digital signature
Re: another script query (perl?)
Perhaps you could show us what you've attempted so far. Also, perlmonks.org is a good place to learn more about perl. -- Neil Watson | Debian Linux System Administrator| Uptime 15 days http://watson-wilson.ca -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: another script query (perl?)
On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote: > Richard Lyons wrote: > > Hi, all you script wizards. > > > > I thought this would be easy, but I haven't found anything to crib > > from... > > > > I need a script to read a text file (actually tex) and parse lines of a > > table that may or may not span newline characters in the file. > > Basically, there are lines of the form > > > >{some text} & {some more text} & {text c} & {text d} \\ > > > > where the braces are only for clarity and do not occur in the files, and > > where the bits of text may include whitespace which may include newline > > characters. There may also be escaped ampersands in the text ('\&'), and > > the text fragments may be empty. > > > > I suspect perl may be the way forward. I need to be able to read each > > file, parse each set of three ampersands with a double backslash > > breaking it into four substrings, manipulate the substrings and write > > the file anew. A typical manipulation will be to take text c and copy > > it to text d. I shall also try to strip leading and trailing whitespace > > to tidy up the file. > > > > Any and all pointers will be gratefully received! > > > > > please give real examples the text you have, as well as more info about > what processing you will do with it. > There are multiple ways to approach this, we need to have more > information first. > I'm not sure it helps a lot, as they vary quite lot, but here is one: \mbox{Walls} &Plain plastered and painted white. &GC but to soiled around switch, RHS as entering, HL marks. OW nail near centre, some blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH corner, white painted, cracks at junctions. & \\ and here is another: &catch, diecast \& epoxy coated with security lock & GC &\\ If it is unclear to any non-latex-user, the ampersands are table column separators in latex. After the manipulation I gave as an example, (copu text c to text d), I would hope they would look like this: \mbox{Walls} & Plain plastered and painted white. & GC but to soiled around switch, RHS as entering, HL marks. OW nail near centre, some blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH corner, white painted, cracks at junctions. & GC but to soiled around switch, RHS as entering, HL marks. OW nail near centre, some blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH corner, white painted, cracks at junctions. \\ and: & catch, diecast \& epoxy coated with security lock & GC & GC \\ The first example shows the problem of included newlines, which might occur as here or anywhere else in the text. Note that the whole text fragment has been copied to the previously void fourth field. The second example shows the need not to be confused by '\&'. If that is any clearer... -- richard -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: another script query (perl?)
Richard Lyons wrote: > Hi, all you script wizards. > > I thought this would be easy, but I haven't found anything to crib > from... > > I need a script to read a text file (actually tex) and parse lines of a > table that may or may not span newline characters in the file. > Basically, there are lines of the form > >{some text} & {some more text} & {text c} & {text d} \\ > > where the braces are only for clarity and do not occur in the files, and > where the bits of text may include whitespace which may include newline > characters. There may also be escaped ampersands in the text ('\&'), and > the text fragments may be empty. > > I suspect perl may be the way forward. I need to be able to read each > file, parse each set of three ampersands with a double backslash > breaking it into four substrings, manipulate the substrings and write > the file anew. A typical manipulation will be to take text c and copy > it to text d. I shall also try to strip leading and trailing whitespace > to tidy up the file. > > Any and all pointers will be gratefully received! > > please give real examples the text you have, as well as more info about what processing you will do with it. There are multiple ways to approach this, we need to have more information first. signature.asc Description: OpenPGP digital signature
another script query (perl?)
Hi, all you script wizards. I thought this would be easy, but I haven't found anything to crib from... I need a script to read a text file (actually tex) and parse lines of a table that may or may not span newline characters in the file. Basically, there are lines of the form {some text} & {some more text} & {text c} & {text d} \\ where the braces are only for clarity and do not occur in the files, and where the bits of text may include whitespace which may include newline characters. There may also be escaped ampersands in the text ('\&'), and the text fragments may be empty. I suspect perl may be the way forward. I need to be able to read each file, parse each set of three ampersands with a double backslash breaking it into four substrings, manipulate the substrings and write the file anew. A typical manipulation will be to take text c and copy it to text d. I shall also try to strip leading and trailing whitespace to tidy up the file. Any and all pointers will be gratefully received! -- richard -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]