subject:"Re\: Parsing HTML"

Re: parsing html

2013-08-08 Thread David Precious

On Thu, 8 Aug 2013 22:42:01 +0530
Unknown User knowsuperunkn...@gmail.com wrote:

 What would be the best module available for parsing html in your
 opinion? My intention is to parse html that contains a table of 5
 columns and any number of rows

For parsing HTML tables, you want HTML::TableExtract, IMO.

https://metacpan.org/module/HTML::TableExtract

It makes life easy.


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: parsing html

2013-08-08 Thread Chankey Pathak

Have a look at HTML::PARSER.
On Aug 8, 2013 10:50 PM, Unknown User knowsuperunkn...@gmail.com wrote:


 What would be the best module available for parsing html in your opinion?
 My intention is to parse html that contains a table of 5 columns and any
 number of rows, and have a hash ref like
 $html-{1}-{col1}=data11, $html-{1}-{col2}=data12 ...
 $html-{2}-{col1}=data21, $html-{2}-{col2}=data22
 ...

 etc
 Would there be an existing module that can do this without too much effort
 on my part?

 Thanks,

Re: parsing html

2013-08-08 Thread timothy adigun

On 8 Aug 2013 18:19, Unknown User knowsuperunkn...@gmail.com wrote:


 What would be the best module available for parsing html in your opinion?

I would also say look at HTML::TreeBuilder

 My intention is to parse html that contains a table of 5 columns and any
number of rows, and have a hash ref like
 $html-{1}-{col1}=data11, $html-{1}-{col2}=data12 ...
 $html-{2}-{col1}=data21, $html-{2}-{col2}=data22
 ...

 etc
 Would there be an existing module that can do this without too much
effort on my part?

 Thanks,

Re: parsing html

2013-08-08 Thread Charles DeRykus

On Thu, Aug 8, 2013 at 10:18 AM, David Precious dav...@preshweb.co.ukwrote:

 On Thu, 8 Aug 2013 22:42:01 +0530
 Unknown User knowsuperunkn...@gmail.com wrote:

  What would be the best module available for parsing html in your
  opinion? My intention is to parse html that contains a table of 5
  columns and any number of rows

 For parsing HTML tables, you want HTML::TableExtract, IMO.

 https://metacpan.org/module/HTML::TableExtract

 It makes life easy.




 I'm with David since the stated objective was to extract table info. As
powerful as the other parsing modules are, you certainly would be grinding
out more code.

-- 
Charles DeRykus

Re: parsing html

2013-08-08 Thread Rob Dixon


On 08/08/2013 18:18, David Precious wrote:

On Thu, 8 Aug 2013 22:42:01 +0530
Unknown User knowsuperunkn...@gmail.com wrote:


What would be the best module available for parsing html in your
opinion? My intention is to parse html that contains a table of 5
columns and any number of rows


For parsing HTML tables, you want HTML::TableExtract, IMO.


Yes. It would help if you explained more about what you want to do, and
showed some HTML rather than just the desired data structure, but
HTML::TableExtract seems to be the right module for this.

There are better HTML parsers for general purposes. XML::LibXML will
handle HTML, and HTML::TreeBuilder is nice.

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Parsing HTML file

2008-06-04 Thread Pat Rice

Hi
try html trees on cpan
there is also html tables, but I haven't used it.
Trees can take a while to understand  but it seems to be pretty good.

Pat


On Mon, Jun 2, 2008 at 12:10 PM, Jeff Peng [EMAIL PROTECTED] wrote:
 On Mon, Jun 2, 2008 at 5:56 PM, Purohit, Bhargav
 [EMAIL PROTECTED] wrote:

 I have a HTML file and which contains many tables in it.
 Out of one of the table I want to extract Information of only few of the
 columns.
 Can anybody help me.

 Nobody can help unless you have tried some coding at first.
 Take a look at this module on CPAN:
 http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm

 --
 Jeff Peng - [EMAIL PROTECTED]
 Professional Squid supports in China
 http://www.ChinaSquid.com/

 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 http://learn.perl.org/




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Parsing HTML file

2008-06-02 Thread Jeff Peng

On Mon, Jun 2, 2008 at 5:56 PM, Purohit, Bhargav
[EMAIL PROTECTED] wrote:

 I have a HTML file and which contains many tables in it.
 Out of one of the table I want to extract Information of only few of the
 columns.
 Can anybody help me.

Nobody can help unless you have tried some coding at first.
Take a look at this module on CPAN:
http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm

-- 
Jeff Peng - [EMAIL PROTECTED]
Professional Squid supports in China
http://www.ChinaSquid.com/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing html question

2008-02-05 Thread Tom Phoenix

On Feb 5, 2008 10:36 AM, isaac2004 [EMAIL PROTECTED] wrote:

 hello, i am trying to parse an html document for links for output, my
 idea is to grab the URL from a form and send the URL to another file
 that does the actual parse process. i am aware that HTML:Parser has a
 built in for this, but i want to learn regex better.

Let me know if your technique helps you to learn regular expressions.
You will also have the chance to learn how complex a data format HTML
is, and why parsers are more complex than simple pattern matches.

For other people who want to learn more about regular expressions, I'd
recommend the book Mastering Regular Expressions by Jeffrey Friedl.
REs are a small language of their own, worthy of study. This book has
interesting, useful, and practical information for any programmer who
works with regular expressions. In fact, when the book first came out,
everyone was expecting that any book on REs would be a pocket
reference. Everyone was stunned at how much of this fat book was news
even to the experts. (Several bugs and shortcomings of Perl's RE
engine were fixed as a direct result of that first edition.) I'm sure
that anyone who frequently uses Perl's patterns, or any other regular
expressions, will find this book informative today.

http://regex.info/

Cheers!

--Tom Phoenix
Stonehenge Perl Training

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing HTML content

2007-08-31 Thread ladder49

On Aug 30, 9:37 pm, [EMAIL PROTECTED] (Daniel Kasak) wrote:
 On Thu, 2007-08-30 at 07:16 -0700, ladder49 wrote:
  Is there a way to dump the HTML code for a web page?  I need to write
  a script which will collect and summarize content from intranet web
  pages.  By dump, I mean to read it the same way you would read a file
  and parse its contents.  Thanks.

 I use LWP::Simple to fetch stuff, and HTML::TreeBuilder to parse it and
 extract stuff.

 --
 Daniel Kasak
 IT Developer
 NUS Consulting Group
 Level 5, 77 Pacific Highway
 North Sydney, NSW, Australia 2060
 T: (+61) 2 9922-7676 / F: (+61) 2 9922 7989
 email: [EMAIL PROTECTED]
 website:http://www.nusconsulting.com.au

Daniel, Jeff,

Thanks for your replies.  Your pointing me to LWP got me started and
now I've got a working script.  Thanks again!


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing HTML content

2007-08-30 Thread Jeff Pang

2007/8/30, ladder49 [EMAIL PROTECTED]:
 Is there a way to dump the HTML code for a web page?  I need to write
 a script which will collect and summarize content from intranet web
 pages.  By dump, I mean to read it the same way you would read a file
 and parse its contents.  Thanks.


You can use lwp to do this.like,

perl -MLWP::Simple -e '$c=get(http://www.yahoo.cn/;);print $c'

see also `perldoc lwpcook`.

-- 
Jeff Pang - [EMAIL PROTECTED]
http://www.readwriteweb.com/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing HTML content

2007-08-30 Thread Daniel Kasak

On Thu, 2007-08-30 at 07:16 -0700, ladder49 wrote:

 Is there a way to dump the HTML code for a web page?  I need to write
 a script which will collect and summarize content from intranet web
 pages.  By dump, I mean to read it the same way you would read a file
 and parse its contents.  Thanks.

I use LWP::Simple to fetch stuff, and HTML::TreeBuilder to parse it and
extract stuff.

--
Daniel Kasak
IT Developer
NUS Consulting Group
Level 5, 77 Pacific Highway
North Sydney, NSW, Australia 2060
T: (+61) 2 9922-7676 / F: (+61) 2 9922 7989
email: [EMAIL PROTECTED]
website: http://www.nusconsulting.com.au


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Parsing HTML (Table)

2007-07-17 Thread Rob Dixon


yitzle wrote:


I'm using WWW::Mechanize to retrieve a web page. I get to this line:

my $page = $mech-response()-decoded_content();

The page got a table with values I wish to extract. What module is 
best suited to getting to that data?


I'm hoping for a somewhat simple to use module. WWW::Mechanize is the
first object oriented module I used (this project), so I'm still 
getting used to it.


Hi. I recommend HTML::TableExtract for what you're describing. We can
help you more if you give us the URL of the web page you're looking at
and what data you need from it.

HTH,

Rob

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing html data

2007-05-23 Thread xavier mas

El Martes 22 Mayo 2007 22:38, [EMAIL PROTECTED] escribió:
 Hi,

 If the problem is to get dictionary translation, I think you should not
 work directkly with LWP or WWW::Mechanize. Those modules provides
 convinient way to get the answer in HTML. This will force you to hack html
 files.

 Altwernatively, you can use dictionaries that provides API like
 www.dictionary.com (and use the perl module WWW::Dictionary
 http://search.cpan.org/~cog/WWW-Dictionary-0.01/lib/WWW/Dictionary.pm) or
 even wikipedia (WWW::Wikipedia at
 http://search.cpan.org/~bricas/WWW-Wikipedia-1.92/lib/WWW/Wikipedia.pm).

 You can manu more dictionaries in the cpan.

 Yaron Kahanovitch


 - Original Message -
 From: xavier mas [EMAIL PROTECTED]
 To: beginners@perl.org
 Sent: 22:14:07 (GMT+0200) Africa/Harare יום שלישי 22 מאי 2007
 Subject: Re: parsing html data

 El Martes 22 Mayo 2007 22:00, David Moreno Garza escribió:
  xavier mas wrote:
   dear all,
  
   I'm trying to make a consult to a online dictionary from an html
   document, the result of the consult has to be output in a text field of
   the same html document.
  
   I understand this can be done using (perl) cgi scripts but am not sure
   which module do I need for that.
 
  A number of modules around LWP can help you with it. WWW::Mechanize,
  specifically, can help you with its specific method, content():
 
  $mech-content(format = 'text');
 
  --
  David Moreno Garza [EMAIL PROTECTED] | http://www.damog.net/
   URL:http://pub.tsn.dk/how-to-quote.php

 Muchas gracias, David.
 --
 Xavier Mas

 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 http://learn.perl.org/

Thank you yaron, I know this, but the purpose is to do an apps that makes 
that.

Greetings,
-- 
Xavier Mas

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing html data

2007-05-22 Thread David Moreno Garza

xavier mas wrote:
 dear all,
 
 I'm trying to make a consult to a online dictionary from an html document, 
 the 
 result of the consult has to be output in a text field of the same html 
 document. 
 
 I understand this can be done using (perl) cgi scripts but am not sure which 
 module do I need for that.

A number of modules around LWP can help you with it. WWW::Mechanize,
specifically, can help you with its specific method, content():

$mech-content(format = 'text');

-- 
David Moreno Garza [EMAIL PROTECTED] | http://www.damog.net/
 URL:http://pub.tsn.dk/how-to-quote.php


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing html data

2007-05-22 Thread xavier mas

El Martes 22 Mayo 2007 22:00, David Moreno Garza escribió:
 xavier mas wrote:
  dear all,
 
  I'm trying to make a consult to a online dictionary from an html
  document, the result of the consult has to be output in a text field of
  the same html document.
 
  I understand this can be done using (perl) cgi scripts but am not sure
  which module do I need for that.

 A number of modules around LWP can help you with it. WWW::Mechanize,
 specifically, can help you with its specific method, content():

 $mech-content(format = 'text');

 --
 David Moreno Garza [EMAIL PROTECTED] | http://www.damog.net/
  URL:http://pub.tsn.dk/how-to-quote.php

Muchas gracias, David.
-- 
Xavier Mas

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing html data

2007-05-22 Thread yaron

Hi,

If the problem is to get dictionary translation, I think you should not work 
directkly with LWP or WWW::Mechanize.
Those modules provides convinient way to get the answer in HTML. This will 
force you to hack html files.

Altwernatively, you can use dictionaries that provides API like 
www.dictionary.com (and use the perl module WWW::Dictionary 
http://search.cpan.org/~cog/WWW-Dictionary-0.01/lib/WWW/Dictionary.pm) or even 
wikipedia (WWW::Wikipedia at 
http://search.cpan.org/~bricas/WWW-Wikipedia-1.92/lib/WWW/Wikipedia.pm).

You can manu more dictionaries in the cpan.

Yaron Kahanovitch


- Original Message -
From: xavier mas [EMAIL PROTECTED]
To: beginners@perl.org
Sent: 22:14:07 (GMT+0200) Africa/Harare יום שלישי 22 מאי 2007
Subject: Re: parsing html data

El Martes 22 Mayo 2007 22:00, David Moreno Garza escribió:
 xavier mas wrote:
  dear all,
 
  I'm trying to make a consult to a online dictionary from an html
  document, the result of the consult has to be output in a text field of
  the same html document.
 
  I understand this can be done using (perl) cgi scripts but am not sure
  which module do I need for that.

 A number of modules around LWP can help you with it. WWW::Mechanize,
 specifically, can help you with its specific method, content():

 $mech-content(format = 'text');

 --
 David Moreno Garza [EMAIL PROTECTED] | http://www.damog.net/
  URL:http://pub.tsn.dk/how-to-quote.php

Muchas gracias, David.
-- 
Xavier Mas

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Parsing HTML

2005-09-01 Thread Scott Taylor


Jenda Krynicky said:
 From: Scott Taylor [EMAIL PROTECTED]
 I'm probably reinventing the wheel here, but I tried to get along with
 HTML::Parser and just couldn't get it to do anything.  To confusing, I
 think.

 I simply want to get a list or real words from an HTML string, minus
 all the HTML stuff.  For example:

 snipped

 use HTML::JFilter qw(StripHTML);
  # http://www24.brinkster.com/jenda/#HTML::JFilter

 $plain_text = StripHTML($text_with_html);

Nice thought, but the link leads nowhere.  Cute comic though.

Thanks.

--
Scott

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: Parsing HTML

2005-09-01 Thread Scott Taylor


Jenda Krynicky said:
 From: Scott Taylor [EMAIL PROTECTED]
 I'm probably reinventing the wheel here, but I tried to get along with
 HTML::Parser and just couldn't get it to do anything.  To confusing, I
 think.

 I simply want to get a list or real words from an HTML string, minus
 all the HTML stuff.  For example:

 snipped

 use HTML::JFilter qw(StripHTML);
  # http://www24.brinkster.com/jenda/#HTML::JFilter

 $plain_text = StripHTML($text_with_html);

Oops, sorry, there it is, just really slow. :)


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

RE: Parsing HTML

2005-08-29 Thread Charles K. Clarkson

Scott Taylor mailto:[EMAIL PROTECTED] wrote:

: Is there a better, maybe more eligant, way to do this?  I don't
: mind to use HTML::Parser if I could only figure out how.

use HTML::TokeParser;

my $html = q(

This is a line of HTML:people write strange things herebr
and hardly ever follow properp
syntax Aamp;B suck at spelling as wellbr
So I need to clean it up and strip out allbr

words less then 3 characters in length.p

Later the words will go into an indexer forbr
searching a database

);

my $p = HTML::TokeParser-new( \$html );

while (my $token = $p-get_token) {
my $string = $p-get_trimmed_text;
$string = \n$string if $token-[1] eq 'br';
$string = \n$string if $token-[1] eq 'p';
print $string;
}

__END__

HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: Parsing HTML Data

2005-06-19 Thread Chris Devers

On Sun, 19 Jun 2005, santosh kumar wrote:

 [Please] help me [with] the perl code reg ex for acheiving the 
 following :
 ( Please note that we should not use any CPAN modules like HTML::Parser etc )

Please show what you've tried so far.

Please explain why you can't use a CPAN module for this. 

A parser is absolutely the right way to solve this problem.

A regex is absolutely the wrong way to solve this problem.

And reinventing that which has already been done 1000 times before is 
absolutely, positively, no doubt at all not a good use of your time :-)
 
But, if you're absolutely set on doing this the way you ask, you *must* 
clarify *exactly* what you want to extract, and you *must* explain what 
code you've tried already. Only then can we help sort things out.

This list is not a free script writing service. Sorry for any confusion.


-- 
Chris Devers

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: Parsing HTML Data

2005-06-19 Thread Randal L. Schwartz

 Santosh == Santosh Kumar [EMAIL PROTECTED] writes:

Santosh All,
Santosh Pls help me the perl code reg ex for acheiving the following :
Santosh ( Please note that we should not use any CPAN modules like 
HTML::Parser etc )
 
Santosh file.html
Santosh html
Santosh UserNameinput...santdhdkgfkdfg/td
Santosh Passwordinput...h1hh1/h1
Santosh /html
 
Santosh Required Output is :
Santosh UserName
Santosh Password

Homework?

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
merlyn@stonehenge.com URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

RE: Parsing HTML Tables

2005-04-30 Thread Charles K. Clarkson

Sri Pen mailto:[EMAIL PROTECTED] wrote:

: and second @tableDefRows match the data between
: TD*..4321191../TDIt matches all the data between
:  TABLETOBDYTR*.../TR/TBODY/TABLE something is wrong
: here.

You are getting this because perl regular expressions are
greedy by default and because regular expressions alone are a poor
substitute for a good parser of HTML.


: Do I need to some how start from TABLE and my match all the
: way tofirst /td and use some backtracking or something?

Perhaps. It is probably best to just scrap the regular
expressions and use a module made for parsing HTML. Here's some
code using a general HTML parser, but there are many table parsers
as well.

my $parser = HTML::TokeParser-new( \$html_string );

my @rows;
while ( $parser-get_tag( 'tr' ) ) {

# check error number
next unless $parser-get_trimmed_text( '/td' ) eq '';

# get next cell
$parser-get_tag( 'td' );

# check userid
next unless $parser-get_trimmed_text( '/td' ) eq 'YHIRA';

my @cells;
while ( $parser-get_tag( 'td' ) ) {
my $text = $parser-get_trimmed_text( '/td' );
push @cells, $text if $text and $text =~ /4321191/;
}

push @rows, [EMAIL PROTECTED] if @cells;
}


HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

RE: Parsing HTML Tables

2005-04-30 Thread Charles K. Clarkson

Charles K. Clarkson mailto:[EMAIL PROTECTED] wrote:

Oopsie!

: my @rows;
: while ( $parser-get_tag( 'tr' ) ) {
: 
: # check error number
: next unless $parser-get_trimmed_text( '/td' ) eq '';

next unless $parser-get_trimmed_text( '/td' ) eq '4321191';

: 
: # get next cell
: $parser-get_tag( 'td' );
: 
: # check userid
: next unless $parser-get_trimmed_text( '/td' ) eq 'YHIRA';
: 
: my @cells;
: while ( $parser-get_tag( 'td' ) ) {
: my $text = $parser-get_trimmed_text( '/td' );
: push @cells, $text if $text and $text =~ /4321191/;
: }
: 
: push @rows, [EMAIL PROTECTED] if @cells;
: }


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: parsing HTML

2004-07-22 Thread Randy W. Sims

Andrew Gaffney wrote:
Here is my current working code. Please take a look at it and see if 
there are any obvious (or not so obvious) problems. I thought this would 
end up being far more difficult.
snip code
Looks good to me. Once you get used to the idea of event based parsing, 
storing context information on a stack, it's really simple, and even 
fun. Another nice thing is once you've mastered one (HTML::Parser), 
you've mastered them all (Pod::Parser, XML::Parser, etc.).

Regards,
Randy.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

tracking where I am in a tree structure (was: Re: parsing HTML)

2004-07-22 Thread Andrew Gaffney

Andrew Gaffney wrote:
Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 11:24 PM, Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason 
site. I intend for it to support nested tables, SPANs, and 
anchors. I am looking for a module that can help me parse existing 
HTML (custom or generated by my scripts) into a tree structure 
similar to:

my $html = [ { tag = 'table', id = 'maintable', width = 300, 
content =
   [ { tag = 'tr', content =
 [
   { tag = 'td', width = 200, content = some 
content },
   { tag = 'td', width = 100, content = more 
content }
 ]
   ]
   ]; # Not tested, but you get the idea

[snip]
I'd rather generate a structure similar to what I have above instead 
of having a large tree of class objects that takes up more RAM and 
is probably slower. How would I go about generating a structure such 
as that above using HTML::Parser?

Parsers like HTML::Parser scan a document and upon encountering 
certain tokens fire off events. In the case of HTML::Parser, events 
are fired when encountering a start tag, the text between tags, and 
at the end tag. If you have an arbitrarily deep document structure 
like HTML, you can store the structure using a stack:

SNIP
Thanks. In the time it took you to put that together, I came up with 
the following to figure out how HTML::Parser works. I'll use your code 
to expand upon it.

SNIP
Here is my current working code. Please take a look at it and see if 
there are any obvious (or not so obvious) problems. I thought this would 
end up being far more difficult.

parsehtml.pl

#!/usr/bin/perl
use strict;
use warnings;
use HTML::Parser ();
my $htmltree = [ { tag = 'document', content = [] } ];
my $node = $htmltree-[0]-{content};
my @prevnodes = ($htmltree);
sub start {
  my $tagname = shift;
  my $attr = shift;
  my $newnode = {};
  $newnode-{tag} = $tagname;
  foreach my $key(keys %{$attr}) {
$newnode-{$key} = $attr-{$key};
  }
  $newnode-{content} = [];
  push @prevnodes, $node;
  push @{$node}, $newnode;
  $node = $newnode-{content};
}
sub end {
  my $tagname = shift;
  $node = pop @prevnodes;
}
sub text {
  my $text = shift;
  chomp $text;
  if($text ne '') {
push @{$node}, $text;
  }
}
my $p = HTML::Parser-new( api_version = 3,
   start_h = [\start, tagname, attr],
   end_h   = [\end,   tagname],
   text_h  = [\text,  dtext] );
$p-parse_file(test.html);
use Data::Dumper;
print Dumper $htmltree;
test.html
=
table id=maintable width=300
tr
td width=200some content/td
td width=100more content/td
/tr
/table
Now for the next challenge. I need to be able to know where I am in the tree 
structure for any node that I am in while I am walking it. I will pass along a 
value via CGI in the form of '0.0.2.1.2' which another script will translate as 
'$htmltree-[0]-{content}-[0]-{content}-[2]-{content}-[1]-{content}-[2]'. 
Using the above code, and the following code I wrote for walking the tree and 
generating HTML from it, how can I mark each outputted HTML tag with its 
position in the tree?

sub descend_htmltree {
  my $node = shift;
  my $withclickiness = shift || 0;
  foreach my $tmpnode (@{$node}) {
if(ref($tmpnode) eq 'HASH') {
  my $nodeid = ; # Magic code to generate node's position in tree
  $htmloutput .= div style='border: thin solid #bb' 
onDblClick=\alert('you clicked $nodeid')\ if($withclickiness);
  $htmloutput .= $tmpnode-{tag};
  foreach(keys %{$tmpnode}) {
$htmloutput .=  $_=\$tmpnode-{$_}\ if($_ ne 'tag'  $_ ne 'content');
  }
  $htmloutput .= ;
  descend_htmltree($tmpnode-{content});
  $htmloutput .= /$tmpnode-{tag};
  $htmloutput .= /div if($withclickiness);
} else {
  $htmloutput .= $tmpnode;
}
  }
}

sub htmltree_to_html {
  my $filename = shift || '';
  my $withclickiness = shift || 0;
  descend_htmltree($htmltree-[0]-{content}, $withclickiness);
  if($filename ne '') {
open HTML,  $filename or die Can't open $filename for HTML output;
print HTML $htmloutput;
close HTML;
  }
  return $htmloutput;
}
--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: parsing HTML

2004-07-21 Thread Randy W. Sims

On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason site. I 
intend for it to support nested tables, SPANs, and anchors. I am looking 
for a module that can help me parse existing HTML (custom or generated 
by my scripts) into a tree structure similar to:

my $html = [ { tag = 'table', id = 'maintable', width = 300, content =
   [ { tag = 'tr', content =
 [
   { tag = 'td', width = 200, content = some 
content },
   { tag = 'td', width = 100, content = more content }
 ]
   ]
   ]; # Not tested, but you get the idea

which would correspond to the following HTML:
table id=maintable width=300
tr
td width=200some content/td
td width=100more content/td
/tr
/table
Once I have the data in the tree, I can easily modify it and transform 
it back into HTML. Is there a module that can help make this easier or 
should I go about this differently?

HTML::Parser doesn't build a tree, but you can use it to build one if 
neccessary. However, you might find building a tree is not neccessary. 
And this is less memory intensive.

Then there is HTML::Tree.
Regards,
Randy.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: parsing HTML

2004-07-21 Thread Andrew Gaffney

Randy W. Sims wrote:
On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason site. I 
intend for it to support nested tables, SPANs, and anchors. I am 
looking for a module that can help me parse existing HTML (custom or 
generated by my scripts) into a tree structure similar to:

my $html = [ { tag = 'table', id = 'maintable', width = 300, 
content =
   [ { tag = 'tr', content =
 [
   { tag = 'td', width = 200, content = some 
content },
   { tag = 'td', width = 100, content = more 
content }
 ]
   ]
   ]; # Not tested, but you get the idea

which would correspond to the following HTML:
table id=maintable width=300
tr
td width=200some content/td
td width=100more content/td
/tr
/table
Once I have the data in the tree, I can easily modify it and transform 
it back into HTML. Is there a module that can help make this easier or 
should I go about this differently?
HTML::Parser doesn't build a tree, but you can use it to build one if 
neccessary. However, you might find building a tree is not neccessary. 
And this is less memory intensive.

Then there is HTML::Tree.
I'd rather generate a structure similar to what I have above instead of having a 
large tree of class objects that takes up more RAM and is probably slower. How 
would I go about generating a structure such as that above using HTML::Parser?

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: parsing HTML

2004-07-21 Thread Randy W. Sims

On 7/21/2004 11:24 PM, Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason site. 
I intend for it to support nested tables, SPANs, and anchors. I am 
looking for a module that can help me parse existing HTML (custom or 
generated by my scripts) into a tree structure similar to:

my $html = [ { tag = 'table', id = 'maintable', width = 300, 
content =
   [ { tag = 'tr', content =
 [
   { tag = 'td', width = 200, content = some 
content },
   { tag = 'td', width = 100, content = more 
content }
 ]
   ]
   ]; # Not tested, but you get the idea

[snip]
I'd rather generate a structure similar to what I have above instead of 
having a large tree of class objects that takes up more RAM and is 
probably slower. How would I go about generating a structure such as 
that above using HTML::Parser?
Parsers like HTML::Parser scan a document and upon encountering certain 
tokens fire off events. In the case of HTML::Parser, events are fired 
when encountering a start tag, the text between tags, and at the end 
tag. If you have an arbitrarily deep document structure like HTML, you 
can store the structure using a stack:

#!/usr/bin/perl
package SampleParser;
use strict;
use HTML::Parser;
use base qw(HTML::Parser);
sub start {
my($self, $tagname, $attr, $attrseq, $origtext) = @_;
my $stack = $self-{_stack};
my $depth = $stack ? @$stack : 0;
print ' ' x $depth, $tagname\n;
push @{$self-{_stack}}, ' ';
}
sub end {
my($self, $tagname, $origtext) = @_;
pop @{$self-{_stack}};
my $stack = $self-{_stack};
my $depth = $stack ? @$stack : 0;
print ' ' x $depth, \\$tagname\n;
}
1;
package main;
use strict;
use warnings;
my $p = SampleParser-new();
$p-parse_file(\*DATA);
__DATA__
html
head
titleTitle/title
body
The body.
/body
/html

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: parsing HTML

2004-07-21 Thread Andrew Gaffney

Randy W. Sims wrote:
On 7/21/2004 11:24 PM, Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason site. 
I intend for it to support nested tables, SPANs, and anchors. I am 
looking for a module that can help me parse existing HTML (custom or 
generated by my scripts) into a tree structure similar to:

my $html = [ { tag = 'table', id = 'maintable', width = 300, 
content =
   [ { tag = 'tr', content =
 [
   { tag = 'td', width = 200, content = some 
content },
   { tag = 'td', width = 100, content = more 
content }
 ]
   ]
   ]; # Not tested, but you get the idea

[snip]
I'd rather generate a structure similar to what I have above instead 
of having a large tree of class objects that takes up more RAM and is 
probably slower. How would I go about generating a structure such as 
that above using HTML::Parser?

Parsers like HTML::Parser scan a document and upon encountering certain 
tokens fire off events. In the case of HTML::Parser, events are fired 
when encountering a start tag, the text between tags, and at the end 
tag. If you have an arbitrarily deep document structure like HTML, you 
can store the structure using a stack:

#!/usr/bin/perl
package SampleParser;
use strict;
use HTML::Parser;
use base qw(HTML::Parser);
sub start {
my($self, $tagname, $attr, $attrseq, $origtext) = @_;
my $stack = $self-{_stack};
my $depth = $stack ? @$stack : 0;
print ' ' x $depth, $tagname\n;
push @{$self-{_stack}}, ' ';
}
sub end {
my($self, $tagname, $origtext) = @_;
pop @{$self-{_stack}};
my $stack = $self-{_stack};
my $depth = $stack ? @$stack : 0;
print ' ' x $depth, \\$tagname\n;
}
1;
package main;
use strict;
use warnings;
my $p = SampleParser-new();
$p-parse_file(\*DATA);
__DATA__
html
head
titleTitle/title
body
The body.
/body
/html
Thanks. In the time it took you to put that together, I came up with the 
following to figure out how HTML::Parser works. I'll use your code to expand 
upon it.

#!/usr/bin/perl
use strict;
use warnings;
use HTML::Parser ();
sub start {
  print start ;
  foreach my $arg (@_) {
if(ref($arg) eq 'HASH') {
  foreach my $key(keys %{$arg}) {
print   $key - $arg-{$key}\n;
  }
} else {
  print $arg\n;
}
  }
}
sub end {
  print end ;
  foreach(@_) {
print $_\n;
  }
}
sub text {
  my $text = shift;
  chomp $text;
  print   text - '$text'\n if($text ne '');
}
my $p = HTML::Parser-new( api_version = 3,
   start_h = [\start, tagname, attr],
   end_h   = [\end,   tagname],
   text_h  = [\text,  dtext],
   marked_sections = 1 ); # Not sure what this does
$p-parse_file(test.html);
The above gives me the expected output for the sample HTML I provided before.
--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: parsing HTML

2004-07-21 Thread Andrew Gaffney

Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 11:24 PM, Andrew Gaffney wrote:
Randy W. Sims wrote:
On 7/21/2004 10:42 PM, Andrew Gaffney wrote:
I am trying to build a HTML editor for use with my HTML::Mason 
site. I intend for it to support nested tables, SPANs, and anchors. 
I am looking for a module that can help me parse existing HTML 
(custom or generated by my scripts) into a tree structure similar to:

my $html = [ { tag = 'table', id = 'maintable', width = 300, 
content =
   [ { tag = 'tr', content =
 [
   { tag = 'td', width = 200, content = some 
content },
   { tag = 'td', width = 100, content = more 
content }
 ]
   ]
   ]; # Not tested, but you get the idea

[snip]
I'd rather generate a structure similar to what I have above instead 
of having a large tree of class objects that takes up more RAM and is 
probably slower. How would I go about generating a structure such as 
that above using HTML::Parser?
Parsers like HTML::Parser scan a document and upon encountering 
certain tokens fire off events. In the case of HTML::Parser, events 
are fired when encountering a start tag, the text between tags, and at 
the end tag. If you have an arbitrarily deep document structure like 
HTML, you can store the structure using a stack:
SNIP
Thanks. In the time it took you to put that together, I came up with the 
following to figure out how HTML::Parser works. I'll use your code to 
expand upon it.
SNIP
Here is my current working code. Please take a look at it and see if there are 
any obvious (or not so obvious) problems. I thought this would end up being far 
more difficult.

parsehtml.pl

#!/usr/bin/perl
use strict;
use warnings;
use HTML::Parser ();
my $htmltree = [ { tag = 'document', content = [] } ];
my $node = $htmltree-[0]-{content};
my @prevnodes = ($htmltree);
sub start {
  my $tagname = shift;
  my $attr = shift;
  my $newnode = {};
  $newnode-{tag} = $tagname;
  foreach my $key(keys %{$attr}) {
$newnode-{$key} = $attr-{$key};
  }
  $newnode-{content} = [];
  push @prevnodes, $node;
  push @{$node}, $newnode;
  $node = $newnode-{content};
}
sub end {
  my $tagname = shift;
  $node = pop @prevnodes;
}
sub text {
  my $text = shift;
  chomp $text;
  if($text ne '') {
push @{$node}, $text;
  }
}
my $p = HTML::Parser-new( api_version = 3,
   start_h = [\start, tagname, attr],
   end_h   = [\end,   tagname],
   text_h  = [\text,  dtext] );
$p-parse_file(test.html);
use Data::Dumper;
print Dumper $htmltree;
test.html
=
table id=maintable width=300
tr
td width=200some content/td
td width=100more content/td
/tr
/table
--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: Parsing HTML Form Data in Perl

2002-06-11 Thread Tor Hildrum


 I have written a e-mail script but cannot get the From part of the
 sendmail protical to recognize e-mails with a period in the user name
 like [EMAIL PROTECTED]  It causes errors.  I know if I send
 brook/.hurd@gm/.com it will work.

/. or \.?

 I cannot locate the code required to
 parse incomming e-mail formatted data to add / before each period.  Any
 one have any suggestions or a better Idea on how to handle this?

[localhost:~] tor% perl -e '$string = test.test\@something;
$string =~ s/(\.)/\\$1/g; print $array;'
test\.test@something\.com
[localhost:~] tor% 

Or

[localhost:~] tor% perl -e '$array = test.test\@something; $array =~
s/(\.)/\/$1/g; print $array;'
test/.test@something/.com
[localhost:~] tor% 

=~ s/(\.)/\/$1/g; seems to be what you are looking for.

Tor


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: parsing html

Re: parsing html

Re: parsing html

Re: parsing html

Re: parsing html

Re: Parsing HTML file

Re: Parsing HTML file

Re: parsing html question

Re: parsing HTML content

Re: parsing HTML content

Re: parsing HTML content

Re: Parsing HTML (Table)

Re: parsing html data

Re: parsing html data

Re: parsing html data

Re: parsing html data

Re: Parsing HTML

Re: Parsing HTML

RE: Parsing HTML

Re: Parsing HTML Data

Re: Parsing HTML Data

RE: Parsing HTML Tables

RE: Parsing HTML Tables

Re: parsing HTML

tracking where I am in a tree structure (was: Re: parsing HTML)

Re: parsing HTML

Re: parsing HTML

Re: parsing HTML

Re: parsing HTML

Re: parsing HTML

Re: Parsing HTML Form Data in Perl

31 matches

Site Navigation

Mail list logo

Footer information