RE: html tree question. clumsy ?
Thanks for your info Rob. But look at what you wrote: I'm pretty sure, I understand it to mean, I believe... My comments and solutions are based on my _actually_writing_code_ to try to do the things you muse about, and _it_did_not_work_. Don't take this the wrong way Rob, I just want to make things clear for other people reading this who might run into the same problem and/or be inclined to try it out. - - Martin -Original Message- From: Rob Dixon [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 23, 2003 5:27 PM To: [EMAIL PROTECTED] Subject: Re: html tree question. clumsy ? Martin Thurn wrote: I ran into similar problems for my module WWW::Search. No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse a new file. I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses HTML::Parser, which provides the 'new', 'parse', 'parse_file', and 'eof' methods. The documentation says: After $p-eof has been called, the parse() and parse_file() methods can be invoked to feed new documents with the parser object. which is poor English, but I understand it to mean that, once the 'eof' method has been called, any further calls to 'parse' or 'parse_file' will create a new HTML tree from scratch. BUT you can use the following code as a reset. I.e. call parse, muck with the tree, do the following four lines, and call parse again. This does the same as new() but without changing the store_comments, store_pis settings, etc: $self-{'_head'} = $self-insert_element('head',1); $self-{'_pos'} = undef; # pull it back up $self-{'_body'} = $self-insert_element('body',1); $self-{'_pos'} = undef; # pull it back up again HTML::Parser will itself insert any implicit html, head and body tags when further input is parsed. The reason you can't re-use your HTML::Element is because it's a reference, and when the tree gets deleted, your Element gets deleted right along with it. Once you have called $tree-delete the object no longer exists, but I believe $tree-delete_content or $tree-eof will allow you to reuse the same object for parsing a new document. Rob
Re: html tree question. clumsy ?
Martin Thurn wrote: I ran into similar problems for my module WWW::Search. No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse a new file. I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses HTML::Parser, which provides the 'new', 'parse', 'parse_file', and 'eof' methods. The documentation says: After $p-eof has been called, the parse() and parse_file() methods can be invoked to feed new documents with the parser object. which is poor English, but I understand it to mean that, once the 'eof' method has been called, any further calls to 'parse' or 'parse_file' will create a new HTML tree from scratch. BUT you can use the following code as a reset. I.e. call parse, muck with the tree, do the following four lines, and call parse again. This does the same as new() but without changing the store_comments, store_pis settings, etc: $self-{'_head'} = $self-insert_element('head',1); $self-{'_pos'} = undef; # pull it back up $self-{'_body'} = $self-insert_element('body',1); $self-{'_pos'} = undef; # pull it back up again HTML::Parser will itself insert any implicit html, head and body tags when further input is parsed. The reason you can't re-use your HTML::Element is because it's a reference, and when the tree gets deleted, your Element gets deleted right along with it. Once you have called $tree-delete the object no longer exists, but I believe $tree-delete_content or $tree-eof will allow you to reuse the same object for parsing a new document. Rob
Re: html tree question. clumsy ?
James.Q.L wrote: Thurn, Martin wrote: I ran into similar problems for my module WWW::Search. No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse a new file. BUT you can use the following code as a reset. I.e. call parse, muck with the tree, do the following four lines, and call parse again. This does the same as new() but without changing the store_comments, store_pis settings, etc: $self-{'_head'} = $self-insert_element('head',1); $self-{'_pos'} = undef; # pull it back up $self-{'_body'} = $self-insert_element('body',1); $self-{'_pos'} = undef; # pull it back up again i will look into how that can be added to $tree later. but a reset method from HTML::TreeBuilder would be nice. As per my previous post, I think $tree-eof or $tree-delete_content will do the trick. The reason you can't re-use your HTML::Element is because it's a reference, and when the tree gets deleted, your Element gets deleted right along with it. - - Martin Thurn just found that $h-clone can get around this. too bad i can't do the same to a tree before it parses the file. from HTML::Element, You are free to clone HTML::TreeBuilder trees, just as long as: 1) they're done being parsed, or 2) you don't expect to resume parsing into the clone. (You can continue parsing into the original; it is never affected.) just to make sure, (english isn't my first language) does it say that i can't clone a tree if it doesn't parse something? No. What this is saying is that you can't do: my $tree = new HTML::TreeBuilder; $tree-parse($string1); $tree-parse($string2); my $tree2 = $tree-clone; $tree2-parse($string3); But, here, you could write $tree-parse($string4); In other words, you can't carry on parsing HTML text into a cloned tree, but the original tree is unaffected. HTH, Rob
Re: html tree question. clumsy ?
At 09:42 AM 2003-09-23, James.Q.L wrote: 1. will $tree-parse_file(parse a new file) will overwrite the old parsed $tree content? No, don't try that. You can call parse_file on a $tree only once. As far as I remember, there's no re-using it. so that i dont have to delete the tree in the sub? Why not just delete the tree? As to your question here: ## why can't keep it outside ?/ my $literal = HTML::Element-new('~literal','text' = $insert); I think because once you insert the $literal object into a tree, then when you later destroy the tree, that deletes that $literal object, too. (Deleting a tree basically means deleting every attribute in every node in the tree.) -- Sean M. Burkehttp://search.cpan.org/~sburke/
RE: html tree question. clumsy ?
I ran into similar problems for my module WWW::Search. No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse a new file. BUT you can use the following code as a reset. I.e. call parse, muck with the tree, do the following four lines, and call parse again. This does the same as new() but without changing the store_comments, store_pis settings, etc: $self-{'_head'} = $self-insert_element('head',1); $self-{'_pos'} = undef; # pull it back up $self-{'_body'} = $self-insert_element('body',1); $self-{'_pos'} = undef; # pull it back up again The reason you can't re-use your HTML::Element is because it's a reference, and when the tree gets deleted, your Element gets deleted right along with it. - - Martin Thurn
RE: html tree question. clumsy ?
--- Thurn, Martin [EMAIL PROTECTED] wrote: I ran into similar problems for my module WWW::Search. No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse a new file. BUT you can use the following code as a reset. I.e. call parse, muck with the tree, do the following four lines, and call parse again. This does the same as new() but without changing the store_comments, store_pis settings, etc: $self-{'_head'} = $self-insert_element('head',1); $self-{'_pos'} = undef; # pull it back up $self-{'_body'} = $self-insert_element('body',1); $self-{'_pos'} = undef; # pull it back up again i will look into how that can be added to $tree later. but a reset method from HTML::TreeBuilder would be nice. The reason you can't re-use your HTML::Element is because it's a reference, and when the tree gets deleted, your Element gets deleted right along with it. - - Martin Thurn just found that $h-clone can get around this. too bad i can't do the same to a tree before it parses the file. from HTML::Element, You are free to clone HTML::TreeBuilder trees, just as long as: 1) they're done being parsed, or 2) you don't expect to resume parsing into the clone. (You can continue parsing into the original; it is never affected.) just to make sure, (english isn't my first language) does it say that i can't clone a tree if it doesn't parse something? Qiang __ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com