On Sun Jan 31, 2010 at 10:54:46 +0800, Zhang Weiwu wrote: > I want to remove all advertisements in my 100 html files. They are > pretty neatly classed, like the following: > > <div class="advertisement"> > ... > </div>
You might enjoy my "html-tool" command which would do the job for you via: html-tool --cut-class=advertisement --file input.html You can get it via: wget http://mybin.repository.steve.org.uk/raw-file/tip/html-tool Or via the repository at: http://mybin.repository.steve.org.uk/ See here for some brief discussion: http://blog.steve.org.uk/oh__this_should_be_stunning_.html Internally it uses the XPath perl module HTML::TreeBuilder::Xpath, but the details probably don't matter. Steve -- -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org