Re: another script query (perl?) -- OT

2007-09-15 Thread Richard Lyons
On Sat, Sep 15, 2007 at 09:22:45PM +1200, Chris Bannister wrote:

> On Fri, Sep 07, 2007 at 03:04:50PM +0100, Richard Lyons wrote:
> > Hi, all you script wizards.
> > 
> > I thought this would be easy, but I haven't found anything to crib
> > from...
> > 
> > I need a script to read a text file (actually tex) and parse lines of a
> > table that may or may not span newline characters in the file.
> > Basically, there are lines of the form
> > 
> >{some text} & {some more text} & {text c} & {text d} \\
> 
> 
> Wrong list, for this sort of question. This list is _supposed_ to be for
> Debian specific usage questions.

Yes, but these guys are _good_.  Problem was solved about a week back
thanks to them.  Have a nice day!

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-15 Thread Chris Bannister
On Fri, Sep 07, 2007 at 03:04:50PM +0100, Richard Lyons wrote:
> Hi, all you script wizards.
> 
> I thought this would be easy, but I haven't found anything to crib
> from...
> 
> I need a script to read a text file (actually tex) and parse lines of a
> table that may or may not span newline characters in the file.
> Basically, there are lines of the form
> 
>{some text} & {some more text} & {text c} & {text d} \\


Wrong list, for this sort of question. This list is _supposed_ to be for
Debian specific usage questions.

-- 
Chris.
==


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-11 Thread Richard Lyons
On Mon, Sep 10, 2007 at 08:30:42PM -0400, Celejar wrote:

> On Fri, 7 Sep 2007 15:04:50 +0100
> Richard Lyons <[EMAIL PROTECTED]> wrote:
> 
[...]
> > 
> > I need a script to read a text file (actually tex) and parse lines of a
> > table that may or may not span newline characters in the file.
> > Basically, there are lines of the form
> > 
> >{some text} & {some more text} & {text c} & {text d} \\
[...]
> 
> Take a look at the perl Text::ParseWords module 'man
> text::parsewords').  It may do what you want, depending on your needs
> with respect to quoting and escaping.

Why yes, that is interesting, albeit it came just too late for me.
But there is sure to be a next time!

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



SOLVED: another script query (perl?)

2007-09-11 Thread Richard Lyons
On Fri, Sep 07, 2007 at 08:26:50AM -0700, tabris wrote:

> Richard Lyons wrote:
> > On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote:
> >> Richard Lyons wrote:
[...]
> >>> I need a script to read a text file (actually tex) and parse lines of a
> >>> table that may or may not span newline characters in the file.
> >>> Basically, there are lines of the form
> >>>
> >>>{some text} & {some more text} & {text c} & {text d} \\
> >>>
> >>> where the braces are only for clarity and do not occur in the files, and
> >>> where the bits of text may include whitespace which may include newline
> >>> characters. There may also be escaped ampersands in the text ('\&'), and
> >>> the text fragments may be empty.
> >>>
> >>> I suspect perl may be the way forward.  I need to be able to read each
> >>> file, parse each set of three ampersands with a double backslash
> >>> breaking it into four substrings, manipulate the substrings and write
> >>> the file anew.  A typical manipulation will be to take text c and copy
> >>> it to text d. I shall also try to strip leading and trailing whitespace
> >>> to tidy up the file.
> >>>   
> >>>   
> >> please give real examples the text you have, as well as more info about
> >> what processing you will do with it.
[...]
> >
> > \mbox{Walls} &Plain plastered and painted white. &GC but to soiled
> > around switch, RHS as entering, HL marks. OW nail near centre, some
> >  blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
> >corner, white painted, cracks at junctions.  & \\
> >
> > and here is another:
> >
> >&catch, diecast \& epoxy coated with security lock & GC &\\

> >   
> well, I'd say something along these lines assuming that you have $l
> populated with the entire piece you want.
> Also note that this attempts to avoid use of regexps where possible, as
> they tend to be slow and hard to read. Not that I dislike regexps, but I
> don't think they're necessary here. Also note that none of this code has
> been tested, it's the product of about 5 minutes of hacking.
> 
> my @phrases = split('&', $l);
> {
> my @tmp;
> while(my $phrase = shift @phrases) {
> if (substr($phrase, -2) eq '\') {
>my $tmp = $phrase .'&'. (shift @phrases);
> }
> push @tmp, $phrase;
> }
> @phrases = @tmp;
> }
> 
> # remove trailing or leading whitespace
> foreach my $phrase (@phrases) {
> $phrase =~ s/^\s//; #remove leading spaces
> $phrase =~ s/\s$//; # remove trailing spaces
> $phrase =~ s/\n/ /g; # change all new-line chars to spaces
> }
> 
> # now reconstruct your text however you want it.
> # I have a good (free, public-domain) line splitter if you need one.

I would like that.

The script fragments were a great help, thanks.  Main bug was $tmp is
unnecessary and should just be $phrase.  I'd post the whole script
(well, two actually -- I used a bash script to set things up and just
called the perl to do the dirty work), but it is such a specific use
that it would be of no use to anyone else.  Pity really after spending
so many hours on it.  Still, at least I learned some perl.  I should
recommend anyone else looking for a beginners' introduction to perl to 

http://www.perltraining.com.au/notes.html

as well as 

http://mailman.linuxchix.org/pipermail/courses/2003-September/001344.html
this second URL is lesson 10, which I have given because the earlier
lessons do not index forward.

There is, of course a mass of other stuff, not least http://cpan.org and
all the perldoc info. 

So thanks, case closed.

-- 
richard
> 



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-10 Thread Celejar
On Fri, 7 Sep 2007 15:04:50 +0100
Richard Lyons <[EMAIL PROTECTED]> wrote:

> Hi, all you script wizards.
> 
> I thought this would be easy, but I haven't found anything to crib
> from...
> 
> I need a script to read a text file (actually tex) and parse lines of a
> table that may or may not span newline characters in the file.
> Basically, there are lines of the form
> 
>{some text} & {some more text} & {text c} & {text d} \\
> 
> where the braces are only for clarity and do not occur in the files, and
> where the bits of text may include whitespace which may include newline
> characters. There may also be escaped ampersands in the text ('\&'), and
> the text fragments may be empty.
> 
> I suspect perl may be the way forward.  I need to be able to read each
> file, parse each set of three ampersands with a double backslash
> breaking it into four substrings, manipulate the substrings and write
> the file anew.  A typical manipulation will be to take text c and copy
> it to text d. I shall also try to strip leading and trailing whitespace
> to tidy up the file.
> 
> Any and all pointers will be gratefully received!

Take a look at the perl Text::ParseWords module 'man
text::parsewords').  It may do what you want, depending on your needs
with respect to quoting and escaping.

> richard

Celejar
--
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-07 Thread Richard Lyons
On Fri, Sep 07, 2007 at 08:26:50AM -0700, tabris wrote:

> Richard Lyons wrote:
> > On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote:
> >
> >   
> >> Richard Lyons wrote:
[...]
> >>>{some text} & {some more text} & {text c} & {text d} \\
> >>>
> >>> where the braces are only for clarity and do not occur in the files, and
> >>> where the bits of text may include whitespace which may include newline
> >>> characters. There may also be escaped ampersands in the text ('\&'), and
> >>> the text fragments may be empty.
> >>>
> >>> I suspect perl may be the way forward.  I need to be able to read each
> >>> file, parse each set of three ampersands with a double backslash
> >>> breaking it into four substrings, manipulate the substrings and write
> >>> the file anew.  A typical manipulation will be to take text c and copy
> >>> it to text d. I shall also try to strip leading and trailing whitespace
> >>> to tidy up the file.
[...]
> >   
> well, I'd say something along these lines assuming that you have $l
> populated with the entire piece you want.
> Also note that this attempts to avoid use of regexps where possible, as
> they tend to be slow and hard to read. Not that I dislike regexps, but I
> don't think they're necessary here. Also note that none of this code has
> been tested, it's the product of about 5 minutes of hacking.
> 
> my @phrases = split('&', $l);
> {
> my @tmp;
> while(my $phrase = shift @phrases) {
> if (substr($phrase, -2) eq '\') {
>my $tmp = $phrase .'&'. (shift @phrases);
> }
> push @tmp, $phrase;
> }
> @phrases = @tmp;
> }
> 
> # remove trailing or leading whitespace
> foreach my $phrase (@phrases) {
> $phrase =~ s/^\s//; #remove leading spaces
> $phrase =~ s/\s$//; # remove trailing spaces
> $phrase =~ s/\n/ /g; # change all new-line chars to spaces
> }
> 
> # now reconstruct your text however you want it.
> # I have a good (free, public-domain) line splitter if you need one.
> 

Thanks for that.  I shall have a serious look at it on Sunday.  I must
admit I had expected the solution to be an inscrutable regexp, so this
is cool.

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-07 Thread tabris
Richard Lyons wrote:
> On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote:
>
>   
>> Richard Lyons wrote:
>> 
>>> Hi, all you script wizards.
>>>
>>> I thought this would be easy, but I haven't found anything to crib
>>> from...
>>>
>>> I need a script to read a text file (actually tex) and parse lines of a
>>> table that may or may not span newline characters in the file.
>>> Basically, there are lines of the form
>>>
>>>{some text} & {some more text} & {text c} & {text d} \\
>>>
>>> where the braces are only for clarity and do not occur in the files, and
>>> where the bits of text may include whitespace which may include newline
>>> characters. There may also be escaped ampersands in the text ('\&'), and
>>> the text fragments may be empty.
>>>
>>> I suspect perl may be the way forward.  I need to be able to read each
>>> file, parse each set of three ampersands with a double backslash
>>> breaking it into four substrings, manipulate the substrings and write
>>> the file anew.  A typical manipulation will be to take text c and copy
>>> it to text d. I shall also try to strip leading and trailing whitespace
>>> to tidy up the file.
>>>
>>> Any and all pointers will be gratefully received!
>>>
>>>   
>>>   
>> please give real examples the text you have, as well as more info about
>> what processing you will do with it.
>> There are multiple ways to approach this, we need to have more
>> information first.
>>
>> 
> I'm not sure it helps a lot, as they vary quite lot, but here is one:
>
> \mbox{Walls} &Plain plastered and painted white. &GC but to soiled
> around switch, RHS as entering, HL marks. OW nail near centre, some
>  blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
>corner, white painted, cracks at junctions.  & \\
>
> and here is another:
>
>&catch, diecast \& epoxy coated with security lock & GC &\\
>
> If it is unclear to any non-latex-user, the ampersands are table column
> separators in latex.
>
> After the manipulation I gave as an example, (copu text c to text d), I
> would hope they would look like this:
>
> \mbox{Walls} & Plain plastered and painted white. & GC but to soiled
> around switch, RHS as entering, HL marks. OW nail near centre, some
> blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
> corner, white painted, cracks at junctions. & GC but to soiled around
> switch, RHS as entering, HL marks. OW nail near centre, some blue-tac
> remnants. LHW hairline cracking at HL. pipe boxing far RH corner, white
> painted, cracks at junctions. \\
>
> and:
>
>   & catch, diecast \& epoxy coated with security lock & GC & GC \\
>
> The first example shows the problem of included newlines, which might
> occur as here or anywhere else in the text. Note that the whole text
> fragment has been copied to the previously void fourth field.
>
> The second example shows the need not to be confused by '\&'.  
>
> If that is any clearer...
>
>   
well, I'd say something along these lines assuming that you have $l
populated with the entire piece you want.
Also note that this attempts to avoid use of regexps where possible, as
they tend to be slow and hard to read. Not that I dislike regexps, but I
don't think they're necessary here. Also note that none of this code has
been tested, it's the product of about 5 minutes of hacking.

my @phrases = split('&', $l);
{
my @tmp;
while(my $phrase = shift @phrases) {
if (substr($phrase, -2) eq '\') {
   my $tmp = $phrase .'&'. (shift @phrases);
}
push @tmp, $phrase;
}
@phrases = @tmp;
}

# remove trailing or leading whitespace
foreach my $phrase (@phrases) {
$phrase =~ s/^\s//; #remove leading spaces
$phrase =~ s/\s$//; # remove trailing spaces
$phrase =~ s/\n/ /g; # change all new-line chars to spaces
}

# now reconstruct your text however you want it.
# I have a good (free, public-domain) line splitter if you need one.



signature.asc
Description: OpenPGP digital signature


Re: another script query (perl?)

2007-09-07 Thread Neil Watson

Perhaps you could show us what you've attempted so far.  Also,
perlmonks.org is a good place to learn more about perl.

--
Neil Watson | Debian Linux
System Administrator| Uptime 15 days
http://watson-wilson.ca


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: another script query (perl?)

2007-09-07 Thread Richard Lyons
On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote:

> Richard Lyons wrote:
> > Hi, all you script wizards.
> >
> > I thought this would be easy, but I haven't found anything to crib
> > from...
> >
> > I need a script to read a text file (actually tex) and parse lines of a
> > table that may or may not span newline characters in the file.
> > Basically, there are lines of the form
> >
> >{some text} & {some more text} & {text c} & {text d} \\
> >
> > where the braces are only for clarity and do not occur in the files, and
> > where the bits of text may include whitespace which may include newline
> > characters. There may also be escaped ampersands in the text ('\&'), and
> > the text fragments may be empty.
> >
> > I suspect perl may be the way forward.  I need to be able to read each
> > file, parse each set of three ampersands with a double backslash
> > breaking it into four substrings, manipulate the substrings and write
> > the file anew.  A typical manipulation will be to take text c and copy
> > it to text d. I shall also try to strip leading and trailing whitespace
> > to tidy up the file.
> >
> > Any and all pointers will be gratefully received!
> >
> >   
> please give real examples the text you have, as well as more info about
> what processing you will do with it.
> There are multiple ways to approach this, we need to have more
> information first.
> 
I'm not sure it helps a lot, as they vary quite lot, but here is one:

\mbox{Walls} &Plain plastered and painted white. &GC but to soiled
around switch, RHS as entering, HL marks. OW nail near centre, some
 blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
   corner, white painted, cracks at junctions.  & \\

and here is another:

   &catch, diecast \& epoxy coated with security lock & GC &\\

If it is unclear to any non-latex-user, the ampersands are table column
separators in latex.

After the manipulation I gave as an example, (copu text c to text d), I
would hope they would look like this:

\mbox{Walls} & Plain plastered and painted white. & GC but to soiled
around switch, RHS as entering, HL marks. OW nail near centre, some
blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
corner, white painted, cracks at junctions. & GC but to soiled around
switch, RHS as entering, HL marks. OW nail near centre, some blue-tac
remnants. LHW hairline cracking at HL. pipe boxing far RH corner, white
painted, cracks at junctions. \\

and:

  & catch, diecast \& epoxy coated with security lock & GC & GC \\

The first example shows the problem of included newlines, which might
occur as here or anywhere else in the text. Note that the whole text
fragment has been copied to the previously void fourth field.

The second example shows the need not to be confused by '\&'.  

If that is any clearer...

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-07 Thread tabris
Richard Lyons wrote:
> Hi, all you script wizards.
>
> I thought this would be easy, but I haven't found anything to crib
> from...
>
> I need a script to read a text file (actually tex) and parse lines of a
> table that may or may not span newline characters in the file.
> Basically, there are lines of the form
>
>{some text} & {some more text} & {text c} & {text d} \\
>
> where the braces are only for clarity and do not occur in the files, and
> where the bits of text may include whitespace which may include newline
> characters. There may also be escaped ampersands in the text ('\&'), and
> the text fragments may be empty.
>
> I suspect perl may be the way forward.  I need to be able to read each
> file, parse each set of three ampersands with a double backslash
> breaking it into four substrings, manipulate the substrings and write
> the file anew.  A typical manipulation will be to take text c and copy
> it to text d. I shall also try to strip leading and trailing whitespace
> to tidy up the file.
>
> Any and all pointers will be gratefully received!
>
>   
please give real examples the text you have, as well as more info about
what processing you will do with it.
There are multiple ways to approach this, we need to have more
information first.



signature.asc
Description: OpenPGP digital signature


another script query (perl?)

2007-09-07 Thread Richard Lyons
Hi, all you script wizards.

I thought this would be easy, but I haven't found anything to crib
from...

I need a script to read a text file (actually tex) and parse lines of a
table that may or may not span newline characters in the file.
Basically, there are lines of the form

   {some text} & {some more text} & {text c} & {text d} \\

where the braces are only for clarity and do not occur in the files, and
where the bits of text may include whitespace which may include newline
characters. There may also be escaped ampersands in the text ('\&'), and
the text fragments may be empty.

I suspect perl may be the way forward.  I need to be able to read each
file, parse each set of three ampersands with a double backslash
breaking it into four substrings, manipulate the substrings and write
the file anew.  A typical manipulation will be to take text c and copy
it to text d. I shall also try to strip leading and trailing whitespace
to tidy up the file.

Any and all pointers will be gratefully received!

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]