Re: another script query (perl?)

2007-09-15 Thread Chris Bannister
On Fri, Sep 07, 2007 at 03:04:50PM +0100, Richard Lyons wrote:
 Hi, all you script wizards.
 
 I thought this would be easy, but I haven't found anything to crib
 from...
 
 I need a script to read a text file (actually tex) and parse lines of a
 table that may or may not span newline characters in the file.
 Basically, there are lines of the form
 
{some text}  {some more text}  {text c}  {text d} \\


Wrong list, for this sort of question. This list is _supposed_ to be for
Debian specific usage questions.

-- 
Chris.
==


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?) -- OT

2007-09-15 Thread Richard Lyons
On Sat, Sep 15, 2007 at 09:22:45PM +1200, Chris Bannister wrote:

 On Fri, Sep 07, 2007 at 03:04:50PM +0100, Richard Lyons wrote:
  Hi, all you script wizards.
  
  I thought this would be easy, but I haven't found anything to crib
  from...
  
  I need a script to read a text file (actually tex) and parse lines of a
  table that may or may not span newline characters in the file.
  Basically, there are lines of the form
  
 {some text}  {some more text}  {text c}  {text d} \\
 
 
 Wrong list, for this sort of question. This list is _supposed_ to be for
 Debian specific usage questions.

Yes, but these guys are _good_.  Problem was solved about a week back
thanks to them.  Have a nice day!

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-11 Thread Richard Lyons
On Mon, Sep 10, 2007 at 08:30:42PM -0400, Celejar wrote:

 On Fri, 7 Sep 2007 15:04:50 +0100
 Richard Lyons [EMAIL PROTECTED] wrote:
 
[...]
  
  I need a script to read a text file (actually tex) and parse lines of a
  table that may or may not span newline characters in the file.
  Basically, there are lines of the form
  
 {some text}  {some more text}  {text c}  {text d} \\
[...]
 
 Take a look at the perl Text::ParseWords module 'man
 text::parsewords').  It may do what you want, depending on your needs
 with respect to quoting and escaping.

Why yes, that is interesting, albeit it came just too late for me.
But there is sure to be a next time!

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-10 Thread Celejar
On Fri, 7 Sep 2007 15:04:50 +0100
Richard Lyons [EMAIL PROTECTED] wrote:

 Hi, all you script wizards.
 
 I thought this would be easy, but I haven't found anything to crib
 from...
 
 I need a script to read a text file (actually tex) and parse lines of a
 table that may or may not span newline characters in the file.
 Basically, there are lines of the form
 
{some text}  {some more text}  {text c}  {text d} \\
 
 where the braces are only for clarity and do not occur in the files, and
 where the bits of text may include whitespace which may include newline
 characters. There may also be escaped ampersands in the text ('\'), and
 the text fragments may be empty.
 
 I suspect perl may be the way forward.  I need to be able to read each
 file, parse each set of three ampersands with a double backslash
 breaking it into four substrings, manipulate the substrings and write
 the file anew.  A typical manipulation will be to take text c and copy
 it to text d. I shall also try to strip leading and trailing whitespace
 to tidy up the file.
 
 Any and all pointers will be gratefully received!

Take a look at the perl Text::ParseWords module 'man
text::parsewords').  It may do what you want, depending on your needs
with respect to quoting and escaping.

 richard

Celejar
--
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-07 Thread tabris
Richard Lyons wrote:
 Hi, all you script wizards.

 I thought this would be easy, but I haven't found anything to crib
 from...

 I need a script to read a text file (actually tex) and parse lines of a
 table that may or may not span newline characters in the file.
 Basically, there are lines of the form

{some text}  {some more text}  {text c}  {text d} \\

 where the braces are only for clarity and do not occur in the files, and
 where the bits of text may include whitespace which may include newline
 characters. There may also be escaped ampersands in the text ('\'), and
 the text fragments may be empty.

 I suspect perl may be the way forward.  I need to be able to read each
 file, parse each set of three ampersands with a double backslash
 breaking it into four substrings, manipulate the substrings and write
 the file anew.  A typical manipulation will be to take text c and copy
 it to text d. I shall also try to strip leading and trailing whitespace
 to tidy up the file.

 Any and all pointers will be gratefully received!

   
please give real examples the text you have, as well as more info about
what processing you will do with it.
There are multiple ways to approach this, we need to have more
information first.



signature.asc
Description: OpenPGP digital signature


Re: another script query (perl?)

2007-09-07 Thread Neil Watson

Perhaps you could show us what you've attempted so far.  Also,
perlmonks.org is a good place to learn more about perl.

--
Neil Watson | Debian Linux
System Administrator| Uptime 15 days
http://watson-wilson.ca


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: another script query (perl?)

2007-09-07 Thread Richard Lyons
On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote:

 Richard Lyons wrote:
  Hi, all you script wizards.
 
  I thought this would be easy, but I haven't found anything to crib
  from...
 
  I need a script to read a text file (actually tex) and parse lines of a
  table that may or may not span newline characters in the file.
  Basically, there are lines of the form
 
 {some text}  {some more text}  {text c}  {text d} \\
 
  where the braces are only for clarity and do not occur in the files, and
  where the bits of text may include whitespace which may include newline
  characters. There may also be escaped ampersands in the text ('\'), and
  the text fragments may be empty.
 
  I suspect perl may be the way forward.  I need to be able to read each
  file, parse each set of three ampersands with a double backslash
  breaking it into four substrings, manipulate the substrings and write
  the file anew.  A typical manipulation will be to take text c and copy
  it to text d. I shall also try to strip leading and trailing whitespace
  to tidy up the file.
 
  Any and all pointers will be gratefully received!
 

 please give real examples the text you have, as well as more info about
 what processing you will do with it.
 There are multiple ways to approach this, we need to have more
 information first.
 
I'm not sure it helps a lot, as they vary quite lot, but here is one:

\mbox{Walls} Plain plastered and painted white. GC but to soiled
around switch, RHS as entering, HL marks. OW nail near centre, some
 blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
   corner, white painted, cracks at junctions.   \\

and here is another:

   catch, diecast \ epoxy coated with security lock  GC \\

If it is unclear to any non-latex-user, the ampersands are table column
separators in latex.

After the manipulation I gave as an example, (copu text c to text d), I
would hope they would look like this:

\mbox{Walls}  Plain plastered and painted white.  GC but to soiled
around switch, RHS as entering, HL marks. OW nail near centre, some
blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
corner, white painted, cracks at junctions.  GC but to soiled around
switch, RHS as entering, HL marks. OW nail near centre, some blue-tac
remnants. LHW hairline cracking at HL. pipe boxing far RH corner, white
painted, cracks at junctions. \\

and:

   catch, diecast \ epoxy coated with security lock  GC  GC \\

The first example shows the problem of included newlines, which might
occur as here or anywhere else in the text. Note that the whole text
fragment has been copied to the previously void fourth field.

The second example shows the need not to be confused by '\'.  

If that is any clearer...

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-07 Thread Richard Lyons
On Fri, Sep 07, 2007 at 08:26:50AM -0700, tabris wrote:

 Richard Lyons wrote:
  On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote:
 

  Richard Lyons wrote:
[...]
 {some text}  {some more text}  {text c}  {text d} \\
 
  where the braces are only for clarity and do not occur in the files, and
  where the bits of text may include whitespace which may include newline
  characters. There may also be escaped ampersands in the text ('\'), and
  the text fragments may be empty.
 
  I suspect perl may be the way forward.  I need to be able to read each
  file, parse each set of three ampersands with a double backslash
  breaking it into four substrings, manipulate the substrings and write
  the file anew.  A typical manipulation will be to take text c and copy
  it to text d. I shall also try to strip leading and trailing whitespace
  to tidy up the file.
[...]

 well, I'd say something along these lines assuming that you have $l
 populated with the entire piece you want.
 Also note that this attempts to avoid use of regexps where possible, as
 they tend to be slow and hard to read. Not that I dislike regexps, but I
 don't think they're necessary here. Also note that none of this code has
 been tested, it's the product of about 5 minutes of hacking.
 
 my @phrases = split('', $l);
 {
 my @tmp;
 while(my $phrase = shift @phrases) {
 if (substr($phrase, -2) eq '\') {
my $tmp = $phrase .''. (shift @phrases);
 }
 push @tmp, $phrase;
 }
 @phrases = @tmp;
 }
 
 # remove trailing or leading whitespace
 foreach my $phrase (@phrases) {
 $phrase =~ s/^\s//; #remove leading spaces
 $phrase =~ s/\s$//; # remove trailing spaces
 $phrase =~ s/\n/ /g; # change all new-line chars to spaces
 }
 
 # now reconstruct your text however you want it.
 # I have a good (free, public-domain) line splitter if you need one.
 

Thanks for that.  I shall have a serious look at it on Sunday.  I must
admit I had expected the solution to be an inscrutable regexp, so this
is cool.

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: another script query (perl?)

2007-09-07 Thread tabris
Richard Lyons wrote:
 On Fri, Sep 07, 2007 at 07:19:17AM -0700, tabris wrote:

   
 Richard Lyons wrote:
 
 Hi, all you script wizards.

 I thought this would be easy, but I haven't found anything to crib
 from...

 I need a script to read a text file (actually tex) and parse lines of a
 table that may or may not span newline characters in the file.
 Basically, there are lines of the form

{some text}  {some more text}  {text c}  {text d} \\

 where the braces are only for clarity and do not occur in the files, and
 where the bits of text may include whitespace which may include newline
 characters. There may also be escaped ampersands in the text ('\'), and
 the text fragments may be empty.

 I suspect perl may be the way forward.  I need to be able to read each
 file, parse each set of three ampersands with a double backslash
 breaking it into four substrings, manipulate the substrings and write
 the file anew.  A typical manipulation will be to take text c and copy
 it to text d. I shall also try to strip leading and trailing whitespace
 to tidy up the file.

 Any and all pointers will be gratefully received!

   
   
 please give real examples the text you have, as well as more info about
 what processing you will do with it.
 There are multiple ways to approach this, we need to have more
 information first.

 
 I'm not sure it helps a lot, as they vary quite lot, but here is one:

 \mbox{Walls} Plain plastered and painted white. GC but to soiled
 around switch, RHS as entering, HL marks. OW nail near centre, some
  blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
corner, white painted, cracks at junctions.   \\

 and here is another:

catch, diecast \ epoxy coated with security lock  GC \\

 If it is unclear to any non-latex-user, the ampersands are table column
 separators in latex.

 After the manipulation I gave as an example, (copu text c to text d), I
 would hope they would look like this:

 \mbox{Walls}  Plain plastered and painted white.  GC but to soiled
 around switch, RHS as entering, HL marks. OW nail near centre, some
 blue-tac remnants. LHW hairline cracking at HL. pipe boxing far RH
 corner, white painted, cracks at junctions.  GC but to soiled around
 switch, RHS as entering, HL marks. OW nail near centre, some blue-tac
 remnants. LHW hairline cracking at HL. pipe boxing far RH corner, white
 painted, cracks at junctions. \\

 and:

catch, diecast \ epoxy coated with security lock  GC  GC \\

 The first example shows the problem of included newlines, which might
 occur as here or anywhere else in the text. Note that the whole text
 fragment has been copied to the previously void fourth field.

 The second example shows the need not to be confused by '\'.  

 If that is any clearer...

   
well, I'd say something along these lines assuming that you have $l
populated with the entire piece you want.
Also note that this attempts to avoid use of regexps where possible, as
they tend to be slow and hard to read. Not that I dislike regexps, but I
don't think they're necessary here. Also note that none of this code has
been tested, it's the product of about 5 minutes of hacking.

my @phrases = split('', $l);
{
my @tmp;
while(my $phrase = shift @phrases) {
if (substr($phrase, -2) eq '\') {
   my $tmp = $phrase .''. (shift @phrases);
}
push @tmp, $phrase;
}
@phrases = @tmp;
}

# remove trailing or leading whitespace
foreach my $phrase (@phrases) {
$phrase =~ s/^\s//; #remove leading spaces
$phrase =~ s/\s$//; # remove trailing spaces
$phrase =~ s/\n/ /g; # change all new-line chars to spaces
}

# now reconstruct your text however you want it.
# I have a good (free, public-domain) line splitter if you need one.



signature.asc
Description: OpenPGP digital signature