Help to set up a filter!

2011-01-12 Thread Marek Stepanek


Hello all!


I want to clean up a film script in a bad html shape. I have replaced
nearly every thing, which has been formatted by a  , many
white spaces and line breaks. Rest again the many actors texts which are
hanging between  and 
  Aaron is laying on the couch, looking drained. Ryan comes down
  the stairs. It looks like he's had some time to recover. He
  walks over and takes a seat.

AARON
  How's Mom?

RYAN
  She's resting.
  A beat.



...
Which has to become something like:
***



  Aaron is laying on the couch, looking drained. Ryan comes down
  the stairs. It looks like he's had some time to recover. He
  walks over and takes a seat.

AARON
  How's Mom?

RYAN


She's resting.
  A beat.



***

The format between the tags I don't mind. At the end BBEdit will care
about it.

I have set up a perl filter, but the problem is, that this filter is not
iterating over the many occurrences of \s+[^<]+?;


foreach ($_ =~ m,(\s+[^<]+?,$&,;
$paragraf =~ s,\n\n,g;
$paragraf =~ s!\s{2,}!!g;
print $paragraf;
$paragraf = ();
}

print;






-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at

If you have a feature request or would like to report a problem, 
please email "supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 


Re: Help to set up a filter!

2011-01-12 Thread Rick
I recommend HTML::Parser  .

-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at

If you have a feature request or would like to report a problem, 
please email "supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 


Re: Help to set up a filter!

2011-01-12 Thread Doug McNutt
At 19:12 +0100 1/12/11, Marek Stepanek wrote, and I snipped a bunch:
>Hello all!
>
>#!/usr/bin/perl
>use strict;
>use warnings;
>
>$/ = undef;
>$_ = <>;
>
>foreach ($_ =~ m,(\s+[^<]+?   my $paragraf = $1;
>   $paragraf =~ s,,$&,;
>   $paragraf =~ s,\n\n   $paragraf =~ s,\n,,g;
>   $paragraf =~ s!\s{2,}!!g;
>   print $paragraf;
>   $paragraf = ();
>}
>
>print;

*

The $& may be referring to the last match as it occurs in a previous loop 
operation instead of what you think.

But why do you need the loop at all?  The g flag will repeat each substitute 
over the whole text if the target is $_ instead of  $paragraf. In the loop you 
probably don't want the g's at all.

$/ = undef;
$paragraf = <>;
$paragraf =~ s,,$&,g;
$paragraf =~ s,\n\n,g;
$paragraf =~ s!\s{2,}!!g;
print $paragraf;

I may well be missing something like being sure not to replace accidental 
matches. Aren't those commas dangerous? You can use anything as the separator 
but I prefer the pipe symbol when trying to avoid /\/ teepees.  You may also 
need a ///s flag to allow matching of return characters. And come to think of 
it are they \n or might they be \r in BBEdit tradition?

-- 
-->  The best programming tool is a soldering iron <--

-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at

If you have a feature request or would like to report a problem, 
please email "supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 


Re: Help to set up a filter!

2011-01-13 Thread Marek Stepanek
On 12.01.2011 23:15, Doug McNutt wrote:

> 
> The $& may be referring to the last match as it occurs in a previous loop 
> operation instead of what you think.
> 
> But why do you need the loop at all?  The g flag will repeat each substitute 
> over the whole text if the target is $_ instead of  $paragraf. In the loop 
> you probably don't want the g's at all.
> 
> $/ = undef;
> $paragraf = <>;
> $paragraf =~ s,,$&,g;
> $paragraf =~ s,\n\n $paragraf =~ s,\n,,g;
> $paragraf =~ s!\s{2,}!!g;
> print $paragraf;
> 
> I may well be missing something like being sure not to replace accidental 
> matches. Aren't those commas dangerous? You can use anything as the separator 
> but I prefer the pipe symbol when trying to avoid /\/ teepees.  You may also 
> need a ///s flag to allow matching of return characters. And come to think of 
> it are they \n or might they be \r in BBEdit tradition?
> 



Hello Doug!



Thank you for your reply. Whether I did not understand your suggestion,
or you did not understand my problem :-)

The problem was iterating over many "naked" (with out any html-tags)
text in a large text file and tag these found occurrences, and replace
the line breaks with . Meanwhile I found the solution, but I don't
know, why this is working with a while-loop, and not with a
foreach-loop. Since I am looking into Perl - this is now about 10 years
- I have had always this comprehension problem between while and foreach
constructs. Also if this is not on topic for this list, could somebody
explain me this difference?

When using the debugger with the following script and the following
example, it is working. Used as a filter in BBEdit, the last paragraph
is not tagged. Strange! That means, my filter is probably not right ...

#!/usr/bin/perl

use strict;
use warnings;

$/ = undef;
$_ = <>;


while ($_ =~ m,(\s+[^<]+?)\n?,$1,;
$paragraf =~ s,\n\n\n,g;
$paragraf =~ s!\s{2,}!!g;
$paragraf =~ s!>\nhttp://www.w3.org/TR/html4/loose.dtd";>

Death at a Funeral Script





body {
font-family: monospace;
}
h1
{
margin: 2em;
font-size: 1.4em;
text-align: center;
}
p.mitte{
font-size: 1em;
text-align: center;
}
p.mitte_bold
{
font-size: 1em;
margin-left: 15em;
font-weight: bold;
}
p.mitte_normal
{
font-size: 1em;
margin-left: 15em;
font-weight: normal;
}
p.rechts
{
font-size: 1em;
text-align: right;
font-weight: bold;
}
p.links_bold
{
font-size: 1em;
margin: 0 0 0 6em;
font-weight: bold;
}
p.links_normal
{
font-size: 1em;
margin: 0em 0em 4em 6em;
font-weight: normal;
}
p.new_page {
margin-bottom: 6em;
}











DEATH AT A FUNERAL

Written by


 Chris Rock & Aeysha Carr


Based on "Death at a Funeral" by Dean Craig


02/6/09


FADE IN:


MUSIC CUE: "DON'T WORRY, BE HAPPY" by Bobby McFerrin plays as we


BEGIN CREDITS

INT. FUNERAL HOME - DAY

  We pan across a room filled with caskets.

EXT. CHURCH - DAY

  People somberly walk into a church.

EXT. FUNERAL HOME - DAY

  We see the signs outside of various funeral homes.

EXT. CEMETERY - DAY

  We close in on a HEADSTONE that reads: "DEATH AT A FUNERAL".

EXT. CHURCH - DAY

  ASIAN PALL BEARERS carry a coffin into a church.

EXT. CEMETERY - DAY

  We push in on another headstone that reads: "Starring Chris
  Rock". As we push through the cemetery, we see various co-
  stars' names chiseled on headstones.
  We see a fleet of hearses driving through a cemetery.

INT. FLORIST - DAY

  We see a FLORIST making a funeral arrangement.

EXT. CEMETERY - DAY

  CLOSE UP of another headstone that has the director's name
  chiseled on it.

END CREDITS

INT. LIVING ROOM - DAY

  MUSIC CUE: The music fades out.
  TIGHT SHOT on AARON (CHRIS ROCK) thirties, dressed in a black
  suit and tie. He looks straight ahead with a somber expression.
  The camera pulls back to reveal he is watching FOUR UNDERTAKERS
  (INCLUDING BRIAN) place a coffin on a raised platform.

(CONTINUED)





-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at

If you have a feature request or would like to report a problem, 
please email "supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 


Re: Help to set up a filter!

2011-01-13 Thread Doug McNutt
At 17:07 +0100 1/13/11, Marek Stepanek wrote:
>
>Thank you for your reply. Whether I did not understand your suggestion,
>or you did not understand my problem :-)
>
>The problem was iterating over many "naked" (with out any html-tags)
>text in a large text file and tag these found occurrences, and replace
>the line breaks with . Meanwhile I found the solution, but I don't
>know, why this is working with a while-loop, and not with a
>foreach-loop. Since I am looking into Perl - this is now about 10 years
>- I have had always this comprehension problem between while and foreach
>constructs. Also if this is not on topic for this list, could somebody
>explain me this difference?
>
>When using the debugger with the following script and the following
>example, it is working. Used as a filter in BBEdit, the last paragraph
>is not tagged. Strange! That means, my filter is probably not right ...
>
>#!/usr/bin/perl
>
>use strict;
>use warnings;
>
>$/ = undef;
>$_ = <>;
>
>
>while ($_ =~ m,(\s+[^<]+?   my $paragraf = $1;
>   my $orig  = $1;
>   $paragraf =~ s,()\n?,$1,;
>   $paragraf =~ s,\n\n   $paragraf =~ s,\n,\n,g;
>   $paragraf =~ s!\s{2,}!!g;
>   $paragraf =~ s!>\n   s/$orig/$paragraf/;
>}
>

Ahha.  Now I understand why you need the loop.

Two things worry me .  In the while loop you use the g flag. Perl has to 
remember where it is in the $_ string that is the document. When you do the 
substitution you are messing with the string that the initial match is working 
with.  Perhaps doing the substitution on a copy of what came in as $_ would 
make a difference because of that.

The other is that the matches need an s option to allow them to match the line 
ends. At least I think it does because match typically stops when it reaches a 
line end. But if it works in a shell I may have some wrong ideas.  With $/ 
undefined it's conceivable that the s option is not required.

That last substitution changes the value of $1 but I don't think that's germane.

It's also quite possible that what you get in a shell based version is working 
on a slightly different version. I'm pretty sure it is recreated from the 16 
bit characters that are in the memory image of the file and when it's delivered 
to the filter there can well be subtle differences - line ends for instance.

foreach typically requires a list as an argument.  while can use anything that 
returns a boolean logic item. there is also the  " for (start; stop; 
increment)"  introduction to a loop.

perl == magic

-- 
--> If  it's not  on  fire  it's  a  software  problem. <--

-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at

If you have a feature request or would like to report a problem, 
please email "supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 


Re: Help to set up a filter!

2011-01-13 Thread Rick
A foreach requires perl to hold the whole file in memory, while the "while"
process a line at a time. Most issues with large files and "foreach" involve
hitting a system memory limit.

On Thu, Jan 13, 2011 at 8:07 AM, Marek Stepanek <
ms...@podiuminternational.org> wrote:

> The problem was iterating over many "naked" (with out any html-tags)
> text in a large text file and tag these found occurrences, and replace
> the line breaks with . Meanwhile I found the solution, but I don't
> know, why this is working with a while-loop, and not with a
> foreach-loop.
>

-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at

If you have a feature request or would like to report a problem, 
please email "supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 


Re: Help to set up a filter!

2011-01-13 Thread Ronald J Kimball
On Thu, Jan 13, 2011 at 02:43:28PM -0800, Rick wrote:
> A foreach requires perl to hold the whole file in memory, while the "while"
> process a line at a time.

That entirely depends on the while loop.  In this case, Marek read the
whole file into memory before the while loop, which is iterating over regex
matches, not lines.

Ronald

-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at

If you have a feature request or would like to report a problem, 
please email "supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: