Re: HTML::Parser
Joel Bernstein wrote: On Mon, Aug 04, 2003 at 05:08:39PM +0100, Rhys Hopkins wrote: Whilst Data Munging with Perl is, of course, a fine book, in With a fine title. Following the recent discussion on the pronunciation of regex / regexp, this is something that has intrigued me for some time, mung - ing as in mung beans, or munj - ing as in sponge ? Put another way do you mung, or munge the data ? I've always understood it to rhyme with monger, which is the mung bean sense. As in fishmonger. I thought that perlmongers are generally people who both munge and mong perl. In the sense of working with it, and getting paid for it. Does that make any sense? er... I think I was happier with Dave's answer, but I believe I understand what you are getting at. I don't think there was ever any doubt that perlmongers are like fishmongers - i.e. persons who mong perl or fish respectively. I am sure there are datamongers out there too, who mong data. From what I gather, though, monging and munging are distinct activities, and by all accounts, this is reflected in the pronunciation. Rhys. /joel
HTML::Parser
Hi, I need to parse an HTML file [0] and pull out all the form elements and put them into a data structure. What I can't seem to do is when I have found a select tag is then parse the associated option tags! So far I have the following... perl use strict; use Data::Dumper; while (IN) { $p-parse($_); } $p-eof; print Dumper($p-htmltree); package FormParser; use base HTML::Parser; my $HTMLTREE = {}; sub start { my ($self, $tag, $attr, $attrseq, $origtext) = @_; if ($tag =~ /^form|input|texarea$/) { my $name = $attr-{name}; foreach my $i (@$attrseq) { next if ($i =~ /name/); $HTMLTREE-{$name}-{$i} = $attr-{$i}; } } elsif ($tag =~ /^select$/) { my $name = $attr-{name}; warn FOUND: $tag : $name\n; foreach my $i (@$attrseq) { next if ($i =~ /name/); $HTMLTREE-{$name}-{$i} = $attr-{$i}; } # Now need to assign the options to each select } } sub text { my ($self, $text) = @_; #print $text; } sub end { my ($self, $tag, $origtext) = @_; #print $origtext; } sub htmltree { return $HTMLTREE; } /perl Andy [0] - here's the HTML !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN html lang=en head titleForm Test/title /head body form action=/cgi-bin/test.pl method=post input type=text name=forenamebr input type=text name=surnamebr input type=hidden name=cid value=1234br input type=radio name=rad_1 value=YYesinput type=radio name=rad_1 value=NNobr input type=radio name=rad_2 value=1Oneinput type=radio name=rad_2 value=2Twobr input type=checkbox name=check_1 value=1Check 1 input type=checkbox name=check_2 value=2Check 2 input type=checkbox name=check_3 value=3Check 3br Password:input type=password name=passwordbr textarea name=text_area/textarea select name=single_select option name=1 value=11/option option name=2 value=22/option /select select name=multiple_select multiple option name=1 value=11/option option name=2 value=22/option option name=3 value=33/option option name=4 value=44/option /select br input type=button name=button input type=submit name=submit_btn input type=reset name=reset /form /body /html
Re: HTML::Parser
From: Andy Williams \(IMAP HILLWAY\) [EMAIL PROTECTED] Date: 8/4/03 1:16:52 PM Hi, I need to parse an HTML file [0] and pull out all the form elements and put them into a data structure. What I can't seem to do is when I have found a select tag is then parse the associated option tags! So far I have the following... perl use strict; use Data::Dumper; while (in) { $p-parse($_); } $p-eof; print Dumper($p-htmltree); package FormParser; use base HTML::Parser; [ snip ] Well firstly, you're using the very old (and nasty) HTML::Parser syntax. It all got a lot nicer (and easier to use[1]) with HTML::Parser version 3. And secondly, if you're trying to build a tree based on the HTML elements, then you might be far better off using HTML::TreeBuilder. Dave... [1] Like you no longer need to subclass it. -- http://www.dave.org.uk Let me see you make decisions, without your television - Depeche Mode (Stripped)
Re: HTML::Parser
Andy Williams (IMAP HILLWAY) wrote: I need to parse an HTML file [0] and pull out all the form elements and put them into a data structure. What I can't seem to do is when I have found a select tag is then parse the associated option tags! When you see the start select, you set a flag to say you're in a select. Then when you see an option, if you're in a select you add its data to the structure that corresponds to the current select. One the end of the select, you clear it (to avoid dealing with stray options). -- Robin Berjon [EMAIL PROTECTED] Research Engineer, Expwayhttp://expway.fr/ 7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488
Re: HTML::Parser
On Mon, 4 Aug 2003, Andy Williams (IMAP HILLWAY) wrote: I need to parse an HTML file [0] and pull out all the form elements and put them into a data structure. What I can't seem to do is when I have found a select tag is then parse the associated option tags! Try HTML::PullParser. The invocation system might be easier for what you're trying to do. S. -- Shevekhttp://www.anarres.org/ I am the Borg. http://www.gothnicity.org/
RE: HTML::Parser
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Cross Sent: 04 August 2003 14:34 To: [EMAIL PROTECTED] Subject: Re: HTML::Parser Well firstly, you're using the very old (and nasty) HTML::Parser syntax. It all got a lot nicer (and easier to use[1]) with HTML::Parser version 3. And secondly, if you're trying to build a tree based on the HTML elements, then you might be far better off using HTML::TreeBuilder. Dave... [1] Like you no longer need to subclass it. Ahhh I like HTML::TreeBuilder does exactly what it says on the tin. Wonderful. Thanks Dave Andy
Re: HTML::Parser
I need to parse an HTML file [0] and pull out all the form elements and put them into a data structure. What I can't seem to do is when I have found a select tag is then parse the associated option tags! If you're getting valid xhtml: use XML::Sablotron::DOM; my $situa = new XML::Sablotron::Situation(); my $doc = new XML::Sablotron::DOM::Document(SITUATION = $sit); $doc-parseBuffer($situa, $buffer); my @options = $doc-xql(//form//select//option); -- Sam Vilain, [EMAIL PROTECTED] The best mirror is a friend's eye. GAELIC PROVERB
RE: HTML::Parser
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Cross Sent: 04 August 2003 14:34 To: [EMAIL PROTECTED] Subject: Re: HTML::Parser And secondly, if you're trying to build a tree based on the HTML elements, then you might be far better off using HTML::TreeBuilder. My earlier exclamation about TreeBuilder has been short lived because I am having one of those days where my brain has stopped working! [0] I now have a nice data structure with the HTML in. How do I access it? This sounds very simple but for the life of me I just can't get at what I need. All I want is what is include between the form tags. I think I have to use the traverse method, but I'm not sure how to do this[1]. Andy [0] - Becoming far too frequent and mainly seem to be on Mondays. [1] - Even after reading Dave's wonderful Data Munging in Perl [2] [2] - Blatant plug!
Re: HTML::Parser
From: Andy Williams \(IMAP HILLWAY\) [EMAIL PROTECTED] Date: 8/4/03 2:49:12 PM -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Cross Sent: 04 August 2003 14:34 To: [EMAIL PROTECTED] Subject: Re: HTML::Parser And secondly, if you're trying to build a tree based on the HTML elements, then you might be far better off using HTML::TreeBuilder. My earlier exclamation about TreeBuilder has been short lived because I am having one of those days where my brain has stopped working! [0] I now have a nice data structure with the HTML in. How do I access it? This sounds very simple but for the life of me I just can't get at what I need. All I want is what is include between the form tags. I think I have to use the traverse method, but I'm not sure how to do this[1]. [0] - Becoming far too frequent and mainly seem to be on Mondays. [1] - Even after reading Dave's wonderful Data Munging in Perl [2] [2] - Blatant plug! Whilst Data Munging with Perl is, of course, a fine book, in this case you'll be better off with Sean Burke's Perl LWP. If you don't have access to a copy then Sean's TPJ article Scanning HTML http://www.samag.com/documents/s=1272/sam05030008/ is a pretty good introduction. I think the method you need is probably as_text. Dave... -- http://www.dave.org.uk Let me see you make decisions, without your television - Depeche Mode (Stripped)
RE: HTML::Parser
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Cross Sent: 04 August 2003 16:06 To: [EMAIL PROTECTED] Subject: Re: HTML::Parser Whilst Data Munging with Perl is, of course, a fine book, in this case you'll be better off with Sean Burke's Perl LWP. If you don't have access to a copy then Sean's TPJ article Scanning HTML http://www.samag.com/documents/s= 1272/sam05030008/ is a pretty good introduction. I think the method you need is probably as_text. Dave... -- I think I have most O'reilly books so I'm sure I have that somewhere. Thanks for the pointer. Andy
Re: HTML::Parser
Whilst Data Munging with Perl is, of course, a fine book, in With a fine title. Following the recent discussion on the pronunciation of regex / regexp, this is something that has intrigued me for some time, mung - ing as in mung beans, or munj - ing as in sponge ? Put another way do you mung, or munge the data ? Rhys.
Re: HTML::Parser
Put another way do you mung, or munge the data ? Ah, at last a question I care about. It is of course mung.* N * For all values of mung where mung is mung** ** Oh, thats pronounced mung by the way*** *** Oh, okay, as in munge[0][1] [0] Hmm, different footnote syntax than I normally use. [1] So, no really, I say munj
Re: HTML::Parser
On Mon, Aug 04, 2003 at 05:08:39PM +0100, Rhys Hopkins wrote: Whilst Data Munging with Perl is, of course, a fine book, in With a fine title. Following the recent discussion on the pronunciation of regex / regexp, this is something that has intrigued me for some time, mung - ing as in mung beans, or munj - ing as in sponge ? Put another way do you mung, or munge the data ? I've always understood it to rhyme with monger, which is the mung bean sense. As in fishmonger. I thought that perlmongers are generally people who both munge and mong perl. In the sense of working with it, and getting paid for it. Does that make any sense? Rhys. /joel
Re: HTML::Parser
On Mon, 4 Aug 2003, Rhys Hopkins wrote: Following the recent discussion on the pronunciation of regex / regexp, this is something that has intrigued me for some time, mung - ing as in mung beans, or munj - ing as in sponge ? Put another way do you mung, or munge the data ? Surely from the verb to munge so the pronuciation should be mun-jing, with jing and in jingle. http://dictionary.reference.com/search?q=munge S.
Re: HTML::Parser
Rhys Hopkins [EMAIL PROTECTED] quoth: * *Put another way do you mung, or munge the data ? mung is a fun word in American English :) The dictionary of american regional english defines it as mang or mung 1. 1884 Amer. Philol. Accos. Trans. for 1183 14.51 WV, Man means in West. Virginia the 'slush about a pig-sty' 1994 DARE File, usually spelled mung, but not often used in writing, was a synonym for crud, guck, for a messy substance of infinite repulsiveness but little specificity, by UMASS students in the 1970s. mung is also used as a verb when you break or mangle something, e.g. Boy, I sure munged that up good 'n proper :) DARE also has an entry for 'munger' which is a derivative of 'mongrel' also 'mongler'. :) So, one could draw the conclusion that to munge is to mongrelize your data...or to mung it all to hell and back :) Good thing it isn't used as 'Perl Monglers' :) e.
Re: HTML::Parser
On Mon, Aug 04, 2003 at 05:08:39PM +0100, Rhys Hopkins ([EMAIL PROTECTED]) wrote: Whilst Data Munging with Perl is, of course, a fine book, in With a fine title. Following the recent discussion on the pronunciation of regex / regexp, this is something that has intrigued me for some time, mung - ing as in mung beans, or munj - ing as in sponge ? Put another way do you mung, or munge the data ? It's definitely munj. Dave... -- Love is a fire of flaming brandy Upon a crepe suzette