Re: Translating newlines to HTML paragraphs

2002-05-22 Thread John Brooking

--- Matthew Weier O'Phinney
<[EMAIL PROTECTED]> wrote:
> I've gone through and read all the other posts in
> reply to this, and they
> all seem to ignore a very simple solution.
> 
> First: strip off the \r\n:
> s/\r\n/\n/sg
> 
> Then look for the pattern \n\n (which would indicate
> the existence of an
> empty line. For example: "Some paragraph text\n\nA

I've got my solution, it's something like yours, and
it works fine. The main difference was I explicitely
used \x0d and \x0a because I wasn't sure if \n and \r
were defined to the same ASCII codes on all platforms.
Any double newlines I assume the user meant a
paragraph, any single ones just a line break.

Here's the code I finally used (embedded tags, sorry
if code wraps in ugly places due to this silly Yahoo
editor):

sub NL2HTML {
   $_ = shift;
   s/\x0d\x0a/\x0d/g;# Strip LF out of
CR/LF combinations (Convert DOS -> *nix)
   s/\x0d{2}|\x0a{2}/<\/p>/g; # Replace double CR
or LF with paragraph break
   s/\x0d|\x0a//g;   # Replace single CR
or LF with line break
   return "$_";   # Wrap whole thing in
outside 
}


=
"When you're following an angel, does it mean you have to throw your body off a 
building?" - They Might Be Giants, http://www.tmbg.com

Word of the week: Serendipity, see http://www.bartleby.com/61/93/S0279300.html

__
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-22 Thread fliptop

drieux wrote:

> but if you wanted to 'clean em all'
> 
> $line =~ s/[$eol]+/\n/g
> 
> would find the case of
> 
> \r
> \r\n
> \n
> \r\n\n
> 
> 
> and replace them all with a single '\n' for all
> occurances in the $line that one is going through


wouldn't this regex work:

while () {
   s/\s+$/\n/g;
}

without having to do all the checking to see if it's a \r or \n or 
whatever, just say 'replace any whitespace that occurs at the end of the 
line with a newline character'.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-22 Thread drieux


On Tuesday, May 21, 2002, at 05:52 , Jake wrote:
[..]
> If the latter method works, that's cool, i havent tested it.  I will admit
> that as I learn this stuff, I tend to do everything the hard way first, 
> then trim it down.

I have test it on [darwin|solaris|redhat linux 7.2] - I could
get over to the windows2000 box if it were really required - but
I would need to rig it to be perlEnabled - and I just do not do
enough work over there to make that worth my time yet. the code is at:

http://www.wetware.com/drieux/pbl/cgi/ParseParmsToPara.txt

{ I think I will opt for the
my $cr = chr(13);   # the ascii value for  - the '\n'
my $lf = chr(10);   # the ascii value for  - the '\r'
my $eol = "$cr|$lf" # the either or pack here.

since, well it's almost like

use constant  => chr(13);

without the emotional crisis }

about the only principle place we 'disagree' is that I tend to
do it the hardWay and then find that there was a module at the CPAN
when I have to go back and rewrite some or all of it

> On Wednesday, May 22, 2002, at 06:28 , Matthew Weier O'Phinney wrote:
[..]
>
> The problem with using patterns such as s/$//m is that while it will
> match the end-of-line condition, it will not _replace_ it (see
> Programming Perl, chapter 2, on Regular Expressions and the s///
> operator).

Good Point!!!

but if you wanted to 'clean em all'

$line =~ s/[$eol]+/\n/g

would find the case of

\r
\r\n
\n
\r\n\n


and replace them all with a single '\n' for all
occurances in the $line that one is going through

ciao
drieux

---

so far I have figured out

a) Perl' RegEx is like lex and yacc,
but without all the make files

b) like sed and awk, but all in one process


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-22 Thread Matthew Weier O'Phinney

I've gone through and read all the other posts in reply to this, and they
all seem to ignore a very simple solution.

First: strip off the \r\n:
s/\r\n/\n/sg

Then look for the pattern \n\n (which would indicate the existence of an
empty line. For example: "Some paragraph text\n\nA new paragraph") and
replace it with \n\n (which gives you "Some paragraph text\n\nA new
paragraph"):
s/\n\n/\n\n/sg

In both regular expressions, you treat the text as a single string (s)
and replace all instances (g).

The problem with using patterns such as s/$//m is that while it will
match the end-of-line condition, it will not _replace_ it (see
Programming Perl, chapter 2, on Regular Expressions and the s///
operator).

--Matthew

On Mon, 20 May 2002 13:51:57 -0400, John Brooking wrote:
> Hello, all,
> 
>I'm trying to translate the value entered in a
> TEXTAREA tag to one or more HTML paragraphs. That means any newlines
> entered into the text box need to be turned into P tags by the script.
> But I'm having trouble coming up with a regex to do this. I know I need
> multiline mode, so I've been trying combinations involving "s/$//mg"
> or "s/^//mg", but nothing has worked so far. Everything I've tried
> has (1) added the P tag but not removed the newline, and/or (2) also
> added one at the end of the string (using $, or at the beginning using
> ^) even though there's not a newline there. The latter seems to be by
> Perl design, but I don't want it in this case.
> 
>   Here's the test program I'm using to experiment
> (warning to HTML email clients, there is an HTML tag in this code):
> 
> my $lf = chr(10);
> my $Text = "Para.${lf}Para.${lf}Para."; $Text =~ s/$//mg; print
> $Text;
> 
>   And a follow-up question: When newlines are entered
> into a TEXTAREA, is TEXTAREA standardized to use CR/LF pairs, or just
> CR, just LF, or does it depend on the client and/or the server platform?
> Do we need to worry about this? Or will the ^ and $ characters work
> correctly with any of these combinations so we don't need to worry about
> it?
> 
> Thanks in advance for any help.
> 
> - John
> 
> 
> =
> "When you're following an angel, does it mean you have to throw your
> body off a building?" - They Might Be Giants, http://www.tmbg.com 
> Word of the week: Serendipity, see
> http://www.bartleby.com/61/93/S0279300.html
> 
> __ Do You Yahoo!? LAUNCH
> - Your Yahoo! Music Experience http://launch.yahoo.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-21 Thread Jake


> so I guess the question is - presume that it is either or
> and rip them all out before proceding anyway

IMHO I think one should presume that it is that way...Thats why we go to the 
trouble to have standards isnt it?  so that everyone doesnt have to go around 
doing custom work for every different OS and application?

If programmers go out of their way to pick up after slackers who dont conform 
to the w3, doesnt that defeat the purpose of the w3?

take a stand, a small stand albeit, but a stand nevertheless :)

btw...does anyone know of a browser that DOESNT conform to the cr/lf for 
newlines?  Is anyone bored enough to go look?

>
> what I am not getting in these debates is why one should do
>
>   my $cr = chr(13);
>   my $lf = chr(10);
>   my $eol = "$cr|$lf";
>
> as opposed to leaving it to perl as
>
>   my $eol = '\r|\n';
>
> for tricks like:
>
>   my $pat= qr/[$eol]+/o;
>

If the latter method works, that's cool, i havent tested it.  I will admit 
that as I learn this stuff, I tend to do everything the hard way first, then 
trim it down.

Cheers
J-



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-21 Thread drieux


On Tuesday, May 21, 2002, at 05:58 , Felix Geerinckx wrote:

> on Tue, 21 May 2002 12:35:59 GMT, [EMAIL PROTECTED] (Bob
> Showalter) wrote:
>
>> This is dependent on the browser, and not the client OS. The
>> HTML standard would be controlling here, and it's pretty
>> vague if you look at the TEXTAREA section.
>
> It was somewhat less vague in the 'HTML 3.2 Reference Specification':

so I guess the question is - presume that it is either or
and rip them all out before proceding anyway

what I am not getting in these debates is why one should do

my $cr = chr(13);
my $lf = chr(10);
my $eol = "$cr|$lf";

as opposed to leaving it to perl as

my $eol = '\r|\n';

for tricks like:

my $pat= qr/[$eol]+/o;

ciao
drieux

---


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Translating newlines to HTML paragraphs

2002-05-21 Thread Felix Geerinckx

on Tue, 21 May 2002 12:35:59 GMT, [EMAIL PROTECTED] (Bob 
Showalter) wrote:

> This is dependent on the browser, and not the client OS. The
> HTML standard would be controlling here, and it's pretty
> vague if you look at the TEXTAREA section. 

It was somewhat less vague in the 'HTML 3.2 Reference Specification':

From



TEXTAREA multi-line text fields
[...]
It is recommended that user agents canonicalize line endings to CR,   
LF (ASCII decimal 13, 10) when submitting the field's contents. The 
character set for submitted data should be ISO Latin-1, unless the 
server has previously indicated that it can support alternative 
character sets. 

-- 
felix

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Translating newlines to HTML paragraphs

2002-05-21 Thread Bob Showalter

> -Original Message-
> From: Jake [mailto:[EMAIL PROTECTED]]
> Sent: Monday, May 20, 2002 5:18 PM
> To: John Brooking; Beginners CGI
> Subject: Re: Translating newlines to HTML paragraphs
> 
> ... My guess though is that textarea newlines will get sent 
> as cr/lf no matter what OS the client is running.

This is dependent on the browser, and not the client OS. The
HTML standard would be controlling here, and it's pretty
vague if you look at the TEXTAREA section. The discussion of
line breaks in general starts at:

<http://www.w3.org/TR/html401/struct/text.html#h-9.3.4>

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-20 Thread drieux


On Monday, May 20, 2002, at 07:36 , John Brooking wrote:

> --- drieux <[EMAIL PROTECTED]> wrote:
>>  use CGI qw/:standard/;
>>
>> for a specific illustration cf:
>>
> http://www.wetware.com/drieux/pbl/cgi/basicPagePopper.txt
[..]
>  The code I included in the message was just
> my test script for developing the regex, but in
> reality, I'll be getting the input from a POST
> parameter originating from a TEXTAREA form tag.

mea kulpa - still learning.

I wanted to show an iterative process - that
would be able to collect a bunch of data from
a source, and put each line of data into a
parenthesis 

you want

my $input="Para 1.\nPara 2.\nPara 3.";  #we want to blow out the \n out

my $p ='';
my $endP = '';
my $nl = '\n';

$input =~ s/$nl/$endP$p/g;
my $output = $p . $input . $endP ;
print "$output\n";

which generates

Para 1.Para 2.Para 3.

sorry for not getting it the first time around.

ciao
drieux

---


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-20 Thread John Brooking

--- drieux <[EMAIL PROTECTED]> wrote:
> 
> maybe I am missing something here - but isn't
> this something you would want to be using say
> 
>   use CGI qw/:standard/;
> 
> for a specific illustration cf:
>
http://www.wetware.com/drieux/pbl/cgi/basicPagePopper.txt

Either you're missing something or I am. I looked at
your page, and at the CPAN doc'n for Inline::File
(which was new to me), but I don't see how that will
help me. The code I included in the message was just
my test script for developing the regex, but in
reality, I'll be getting the input from a POST
parameter originating from a TEXTAREA form tag. So I
don't see how that can fit the structure that
Inline::File is expecting, i.e. data at the end of the
script file.

See the immediately prior messages between Jake and I
for more about what I'm trying to do.

- John


=
"When you're following an angel, does it mean you have to throw your body off a 
building?" - They Might Be Giants, http://www.tmbg.com

Word of the week: Serendipity, see http://www.bartleby.com/61/93/S0279300.html

__
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-20 Thread John Brooking

--- Jake <[EMAIL PROTECTED]>
wrote:
> 
> On my machine (linux) if I dump textarea input to a
> ascii text file like so...
> 
> my $ta = $query->param('myTextArea');
> print outFile $ta;
> 
> newlines are saved as cr/lf which corresponds to the
> hex characters 0D and 0A.  
>
> ... [snip stuff about end of lines] ...
>

Yes, I know about end-of-lines on different systems,
and in fact I have already verified that I was getting
0D/0A combinations from the textarea, but I wasn't
sure if that was just because I'm using ActiveState on
a Windows machine. Thanks your relating your results.

> anyway, my point would be that maybe it's better to
> test for the hex codes 0D 
> and 0A instead of looking for '^'s and such...then
> replace them with a , 
> and do the reverse when you want to send the info
> back to a text area.

That's a good idea. 

> I dont know how to test against the hex codes
> though...anybody?

According to the handy O'Reilly Perl 5 Pocket
Reference (highly recommended), "Patterns are
processed as double-quoted strings, so standard string
escapes (see page 7) have their usual meaning." On
page 7, we are reminded of "\xdd" where "dd" is a
two-digit hex number, such as "\x0d" or "\x0a". Or
maybe even just "\n" would suffice. So I'll try those
tomorrow (it's bedtime now) and let you know what I
come up with.

> Also, you confused me a bit, I'm assuming that at
> some point you want to 
> display the textarea input in a webpage but not
> necessarily inside another 
> textarea.  If you are only saving from and loading
> into textareas, you would 
> never need s or s
> 
> what im thinking of is if someone enters ...
> 
> Hello:
> my name is 
> Bobby
> 
> ...in your textarea, you want to save this to a file
> and then generate a web 
> page from it that will display
> 
> Hello:
> my name is
> Bobby
> 
> and not..
> 
> Hello:my name is Bobby
> 
> is this what you're trying to do or am I way off
> base?

You are exactly right. Sorry if I was vague, I was
trying to find the balance between enough information
and too much. I will be storing the text as records in
a CSV file, for later display on a page *or*
re-editing in a textarea by authors. I'll save it with
the HTML tags, since I can't keep the newlines in a
CSV file record anyway, and just convert it back into
newlines when needed to put back in a textarea for
further editing.

Thanks!


=
"When you're following an angel, does it mean you have to throw your body off a 
building?" - They Might Be Giants, http://www.tmbg.com

Word of the week: Serendipity, see http://www.bartleby.com/61/93/S0279300.html

__
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-20 Thread Jake


On my machine (linux) if I dump textarea input to a ascii text file like so...

my $ta = $query->param('myTextArea');
print outFile $ta;

newlines are saved as cr/lf which corresponds to the hex characters 0D and 0A.  
If I look at this file with some text editors it will look like ^M . but it's 
not actually a '^M' which would correspond to 5E and 4D...does this make 
sense?  (if you look at it with a binary editor you will see the 0A and 0D)

0A/0D is the DOS standard way of doing newlines (unix is just 0A i believe), 
so since it works this way on a linux box, i'll assume that it's the standard 
for All systems ;) you might want to check this to be sure...especially on a 
mac.  My guess though is that textarea newlines will get sent as cr/lf no 
matter what OS the client is running.

anyway, my point would be that maybe it's better to test for the hex codes 0D 
and 0A instead of looking for '^'s and such...then replace them with a , 
and do the reverse when you want to send the info back to a text area.

I dont know how to test against the hex codes though...anybody?

Also, you confused me a bit, I'm assuming that at some point you want to 
display the textarea input in a webpage but not necessarily inside another 
textarea.  If you are only saving from and loading into textareas, you would 
never need s or s

what im thinking of is if someone enters ...

Hello:
my name is 
Bobby

...in your textarea, you want to save this to a file and then generate a web 
page from it that will display

Hello:
my name is
Bobby

and not..

Hello:my name is Bobby

is this what you're trying to do or am I way off base?

J-


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-20 Thread John Brooking

True, but if I am trying to repeatedly translate in
both directions, won't this cause the newlines to
grow? For example (P tags shown as [p] here):

  TEXTAREA sends "Para 1.\nPara 2.\nPara 3."

  I store: "Para 1.\n[p]Para 2.\n[p]Para 3."

  Translated back for TEXTAREA next time:
   "Para 1.\n\nPara 2.\n\nPara 3."

  I store next time:
   "Para 1.\n\n[p]Para 2.\n\n[p]Para 3."

  You see where I'm going with this?

  And plus, I'm also getting newlines added at the end
because that also matches "$", although I don't want
those. I could remove them afterwards, but that's not
very elegant. :-)

--- Jake <[EMAIL PROTECTED]>
wrote:
> 
> On Monday 20 May 2002 01:51 pm, John Brooking wrote:
> > Hello, all,
> 
> > has worked so far. Everything I've tried has (1)
> added
> > the P tag but not removed the newline, and/or (2)
> also
> 
>   the first case should work, after all, the browser
> will ignore newline and 
> carriage return characters, so it shouldn't matter
> if they are left in.  All 
> you care about is putting a paragraph tag in the
> right place.
> 


=
"When you're following an angel, does it mean you have to throw your body off a 
building?" - They Might Be Giants, http://www.tmbg.com

Word of the week: Serendipity, see http://www.bartleby.com/61/93/S0279300.html

__
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-20 Thread drieux


On Monday, May 20, 2002, at 10:51 , John Brooking wrote:

>
> Thanks in advance for any help.
>
> - John

maybe I am missing something here - but isn't
this something you would want to be using say

use CGI qw/:standard/;

for a specific illustration cf:

http://www.wetware.com/drieux/pbl/cgi/basicPagePopper.txt


ciao
drieux

---


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Translating newlines to HTML paragraphs

2002-05-20 Thread Jake


On Monday 20 May 2002 01:51 pm, John Brooking wrote:
> Hello, all,

> has worked so far. Everything I've tried has (1) added
> the P tag but not removed the newline, and/or (2) also

the first case should work, after all, the browser will ignore newline and 
carriage return characters, so it shouldn't matter if they are left in.  All 
you care about is putting a paragraph tag in the right place.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Translating newlines to HTML paragraphs

2002-05-20 Thread John Brooking

Hello, all,

   I'm trying to translate the value entered in a
TEXTAREA tag to one or more HTML paragraphs. That
means any newlines entered into the text box need to
be turned into P tags by the script. But I'm having
trouble coming up with a regex to do this. I know I
need multiline mode, so I've been trying combinations
involving "s/$//mg" or "s/^//mg", but nothing
has worked so far. Everything I've tried has (1) added
the P tag but not removed the newline, and/or (2) also
added one at the end of the string (using $, or at the
beginning using ^) even though there's not a newline
there. The latter seems to be by Perl design, but I
don't want it in this case.

  Here's the test program I'm using to experiment
(warning to HTML email clients, there is an HTML tag
in this code):

my $lf = chr(10);
my $Text = "Para.${lf}Para.${lf}Para.";
$Text =~ s/$//mg;
print $Text;

  And a follow-up question: When newlines are entered
into a TEXTAREA, is TEXTAREA standardized to use CR/LF
pairs, or just CR, just LF, or does it depend on the
client and/or the server platform? Do we need to worry
about this? Or will the ^ and $ characters work
correctly with any of these combinations so we don't
need to worry about it?

Thanks in advance for any help.

- John


=
"When you're following an angel, does it mean you have to throw your body off a 
building?" - They Might Be Giants, http://www.tmbg.com

Word of the week: Serendipity, see http://www.bartleby.com/61/93/S0279300.html

__
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]