Re: HTML::Parser

2003-08-14 Thread Rhys Hopkins
Joel Bernstein wrote:
On Mon, Aug 04, 2003 at 05:08:39PM +0100, Rhys Hopkins wrote:

Whilst Data Munging with Perl is, of course, a fine book, in
With a fine title.

Following the recent discussion on the pronunciation of
regex / regexp, this is something that has intrigued me for some time,
mung - ing  as in mung beans, or
munj - ing  as in sponge ?
Put another way do you mung, or munge the data ?


I've always understood it to rhyme with monger, which is the mung bean
sense. As in fishmonger.
I thought that perlmongers are generally people who both munge and mong
perl. In the sense of working with it, and getting paid for it.
Does that make any sense?


er... I think I was happier with Dave's answer, but I believe
I understand what you are getting at. I don't think there was
ever any doubt that perlmongers are like fishmongers - i.e.
persons who mong perl or fish respectively. I am sure there
are datamongers out there too, who mong data.
From what I gather, though, monging and munging are distinct
activities, and by all accounts, this is reflected in the
pronunciation.


Rhys.


/joel









HTML::Parser

2003-08-04 Thread Andy Williams \(IMAP HILLWAY\)
Hi,

I need to parse an HTML file [0] and pull out all the form elements and
put them into a data structure. What I can't seem to do is when I have
found a select tag is then parse the associated option tags!

So far I have the following...

perl
use strict;
use Data::Dumper;
while (IN) {
$p-parse($_);
}
$p-eof;
  
print Dumper($p-htmltree);


package FormParser;
use base HTML::Parser;

my $HTMLTREE = {};

sub start {
my ($self, $tag, $attr, $attrseq, $origtext) = @_;
  
if ($tag =~ /^form|input|texarea$/) {
  
my $name = $attr-{name};
foreach my $i (@$attrseq) {
next if ($i =~ /name/);
$HTMLTREE-{$name}-{$i} = $attr-{$i};
}

} elsif ($tag =~ /^select$/) {

my $name = $attr-{name};
warn FOUND: $tag : $name\n;
foreach my $i (@$attrseq) {
next if ($i =~ /name/);
$HTMLTREE-{$name}-{$i} = $attr-{$i};
}

# Now need to assign the options to each select


}
}
  
sub text {
my ($self, $text) = @_;
#print $text;
}
  
sub end {
my ($self, $tag, $origtext) = @_;
#print $origtext;
}
  
sub htmltree {
return $HTMLTREE;
}
/perl


Andy

[0] - here's the HTML
!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN html
lang=en
head
titleForm Test/title
/head
body
form action=/cgi-bin/test.pl method=post
input type=text name=forenamebr
input type=text name=surnamebr
input type=hidden name=cid value=1234br
input type=radio name=rad_1 value=YYesinput
type=radio name=rad_1 value=NNobr
input type=radio name=rad_2 value=1Oneinput
type=radio name=rad_2 value=2Twobr
input type=checkbox name=check_1 value=1Check 1
input type=checkbox name=check_2 value=2Check 2
input type=checkbox name=check_3 value=3Check 3br
Password:input type=password name=passwordbr
textarea name=text_area/textarea
select name=single_select
option name=1 value=11/option
option name=2 value=22/option
/select
select name=multiple_select multiple
option name=1 value=11/option
option name=2 value=22/option
option name=3 value=33/option
option name=4 value=44/option
/select
br
input type=button name=button
input type=submit name=submit_btn
input type=reset name=reset

/form
/body
/html




Re: HTML::Parser

2003-08-04 Thread Dave Cross

From: Andy Williams \(IMAP HILLWAY\) [EMAIL PROTECTED]
Date: 8/4/03 1:16:52 PM

 Hi,

 I need to parse an HTML file [0] and pull out all the form
 elements and put them into a data structure. What I can't 
 seem to do is when I have found a select tag is then 
 parse the associated option tags!

 So far I have the following...

 perl
 use strict;
 use Data::Dumper;
 while (in) {
$p-parse($_);
 }
 $p-eof;
  
 print Dumper($p-htmltree);


 package FormParser;
 use base HTML::Parser;

[ snip ]

Well firstly, you're using the very old (and nasty) HTML::Parser
syntax. It all got a lot nicer (and easier to use[1]) with HTML::Parser
version 3.

And secondly, if you're trying to build a tree based on the HTML
elements, then you might be far better off using HTML::TreeBuilder.

Dave...

[1] Like you no longer need to subclass it.
-- 
http://www.dave.org.uk

Let me see you make decisions, without your television
   - Depeche Mode (Stripped)







Re: HTML::Parser

2003-08-04 Thread Robin Berjon
Andy Williams (IMAP HILLWAY) wrote:
I need to parse an HTML file [0] and pull out all the form elements and
put them into a data structure. What I can't seem to do is when I have
found a select tag is then parse the associated option tags!
When you see the start select, you set a flag to say you're in a select. Then 
when you see an option, if you're in a select you add its data to the structure 
that corresponds to the current select. One the end of the select, you clear it 
(to avoid dealing with stray options).

--
Robin Berjon [EMAIL PROTECTED]
Research Engineer, Expwayhttp://expway.fr/
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488



Re: HTML::Parser

2003-08-04 Thread Shevek
On Mon, 4 Aug 2003, Andy Williams (IMAP HILLWAY) wrote:

 I need to parse an HTML file [0] and pull out all the form elements and
 put them into a data structure. What I can't seem to do is when I have
 found a select tag is then parse the associated option tags!

Try HTML::PullParser. The invocation system might be easier for what
you're trying to do.

S.

-- 
Shevekhttp://www.anarres.org/
I am the Borg. http://www.gothnicity.org/



RE: HTML::Parser

2003-08-04 Thread Andy Williams \(IMAP HILLWAY\)


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Dave Cross
 Sent: 04 August 2003 14:34
 To: [EMAIL PROTECTED]
 Subject: Re: HTML::Parser
 
 Well firstly, you're using the very old (and nasty) 
 HTML::Parser syntax. It all got a lot nicer (and easier to 
 use[1]) with HTML::Parser version 3.
 
 And secondly, if you're trying to build a tree based on the 
 HTML elements, then you might be far better off using 
 HTML::TreeBuilder.
 
 Dave...
 
 [1] Like you no longer need to subclass it.

Ahhh I like HTML::TreeBuilder does exactly what it says on the
tin.
Wonderful. 
Thanks Dave


Andy




Re: HTML::Parser

2003-08-04 Thread Sam Vilain
I need to parse an HTML file [0] and pull out all the form elements
 and put them into a data structure. What I can't seem to do is when I
 have found a select tag is then parse the associated option tags!

If you're getting valid xhtml:

use XML::Sablotron::DOM;

my $situa = new XML::Sablotron::Situation();
my $doc = new XML::Sablotron::DOM::Document(SITUATION = $sit);

$doc-parseBuffer($situa, $buffer);

my @options = $doc-xql(//form//select//option);

-- 
Sam Vilain, [EMAIL PROTECTED]

  The best mirror is a friend's eye.
GAELIC PROVERB





RE: HTML::Parser

2003-08-04 Thread Andy Williams \(IMAP HILLWAY\)


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Dave Cross
 Sent: 04 August 2003 14:34
 To: [EMAIL PROTECTED]
 Subject: Re: HTML::Parser
 
 And secondly, if you're trying to build a tree based on the 
 HTML elements, then you might be far better off using 
 HTML::TreeBuilder.
 

My earlier exclamation about TreeBuilder has been short lived because I
am having one of those days where my brain has stopped working! [0]

I now have a nice data structure with the HTML in. How do I access it?
This sounds very simple but for the life of me I just can't get at what
I need.

All I want is what is include between the form tags.

I think I have to use the traverse method, but I'm not sure how to do
this[1].


Andy

[0] - Becoming far too frequent and mainly seem to be on Mondays.
[1] - Even after reading Dave's wonderful Data Munging in Perl [2]
[2] - Blatant plug!




Re: HTML::Parser

2003-08-04 Thread Dave Cross

From: Andy Williams \(IMAP HILLWAY\) [EMAIL PROTECTED]
Date: 8/4/03 2:49:12 PM

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Dave Cross
 Sent: 04 August 2003 14:34
 To: [EMAIL PROTECTED]
 Subject: Re: HTML::Parser
 
 And secondly, if you're trying to build a tree based on 
 the HTML elements, then you might be far better off using

 HTML::TreeBuilder.
 
 My earlier exclamation about TreeBuilder has been short 
 lived because I am having one of those days where my brain
 has stopped working! [0]

 I now have a nice data structure with the HTML in. How do 
 I access it? This sounds very simple but for the life of 
 me I just can't get at what I need.

 All I want is what is include between the form tags.

 I think I have to use the traverse method, but I'm not 
 sure how to do this[1].

 [0] - Becoming far too frequent and mainly seem to be on 
 Mondays.
 [1] - Even after reading Dave's wonderful Data Munging in
 Perl [2]
 [2] - Blatant plug!

Whilst Data Munging with Perl is, of course, a fine book, in
this case you'll be better off with Sean Burke's Perl  LWP.
If you don't have access to a copy then Sean's TPJ article Scanning
HTML http://www.samag.com/documents/s=1272/sam05030008/ is
a pretty good introduction.

I think the method you need is probably as_text.

Dave...

-- 
http://www.dave.org.uk

Let me see you make decisions, without your television
   - Depeche Mode (Stripped)







RE: HTML::Parser

2003-08-04 Thread Andy Williams \(IMAP HILLWAY\)


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Dave Cross
 Sent: 04 August 2003 16:06
 To: [EMAIL PROTECTED]
 Subject: Re: HTML::Parser
 
 
 
 Whilst Data Munging with Perl is, of course, a fine book, 
 in this case you'll be better off with Sean Burke's Perl  
 LWP. If you don't have access to a copy then Sean's TPJ article
Scanning HTML  http://www.samag.com/documents/s= 1272/sam05030008/
is a 
 pretty good introduction.
 
 I think 
 the method you need is probably as_text.
 
 Dave...
 
 -- 

I think I have most O'reilly books so I'm sure I have that somewhere.
Thanks for the pointer.

Andy




Re: HTML::Parser

2003-08-04 Thread Rhys Hopkins

 Whilst Data Munging with Perl is, of course, a fine book, in
With a fine title.

Following the recent discussion on the pronunciation of
regex / regexp, this is something that has intrigued me for some time,
mung - ing  as in mung beans, or
munj - ing  as in sponge ?
Put another way do you mung, or munge the data ?

Rhys.





Re: HTML::Parser

2003-08-04 Thread Nigel Rantor

Put another way do you mung, or munge the data ?
Ah, at last a question I care about.

It is of course mung.*

  N

* For all values of mung where mung is mung**

** Oh, thats pronounced mung by the way***

*** Oh, okay, as in munge[0][1]

[0] Hmm, different footnote syntax than I normally use.

[1] So, no really, I say munj




Re: HTML::Parser

2003-08-04 Thread Joel Bernstein
On Mon, Aug 04, 2003 at 05:08:39PM +0100, Rhys Hopkins wrote:
 
  Whilst Data Munging with Perl is, of course, a fine book, in
 
 With a fine title.
 
 Following the recent discussion on the pronunciation of
 regex / regexp, this is something that has intrigued me for some time,
 
 mung - ing  as in mung beans, or
 munj - ing  as in sponge ?
 
 Put another way do you mung, or munge the data ?

I've always understood it to rhyme with monger, which is the mung bean
sense. As in fishmonger.

I thought that perlmongers are generally people who both munge and mong
perl. In the sense of working with it, and getting paid for it.

Does that make any sense?
 
 Rhys.

/joel



Re: HTML::Parser

2003-08-04 Thread Simon Wilcox
On Mon, 4 Aug 2003, Rhys Hopkins wrote:

 Following the recent discussion on the pronunciation of
 regex / regexp, this is something that has intrigued me for some time,
 
 mung - ing  as in mung beans, or
 munj - ing  as in sponge ?
 
 Put another way do you mung, or munge the data ?

Surely from the verb to munge so the pronuciation should be mun-jing, 
with jing and in jingle.

http://dictionary.reference.com/search?q=munge

S.




Re: HTML::Parser

2003-08-04 Thread Elaine -HFB- Ashton
Rhys Hopkins [EMAIL PROTECTED] quoth:
*
*Put another way do you mung, or munge the data ?

mung is a fun word in American English :)

The dictionary of american regional english defines it as

mang or mung

1. 1884 Amer. Philol. Accos. Trans. for 1183 14.51 WV, Man means in West.
Virginia the 'slush about a pig-sty' 1994 DARE File, usually spelled mung,
but not often used in writing, was a synonym for crud, guck, for a messy
substance of infinite repulsiveness but little specificity, by UMASS
students in the 1970s.

mung is also used as a verb when you break or mangle something, e.g. Boy,
I sure munged that up good 'n proper :)

DARE also has an entry for 'munger' which is a derivative of 'mongrel'
also 'mongler'. :)

So, one could draw the conclusion that to munge is to mongrelize your
data...or to mung it all to hell and back :) 

Good thing it isn't used as 'Perl Monglers' :)

e.



Re: HTML::Parser

2003-08-04 Thread Dave Cross
On Mon, Aug 04, 2003 at 05:08:39PM +0100, Rhys Hopkins ([EMAIL PROTECTED]) wrote:
 
  Whilst Data Munging with Perl is, of course, a fine book, in
 
 With a fine title.
 
 Following the recent discussion on the pronunciation of
 regex / regexp, this is something that has intrigued me for some time,
 
 mung - ing  as in mung beans, or
 munj - ing  as in sponge ?
 
 Put another way do you mung, or munge the data ?

It's definitely munj.

Dave...

-- 
  Love is a fire of flaming brandy
  Upon a crepe suzette