Hi, a huge part of the sites I crawl do not have any meaningful text inside of <title></title>. Unlike this, the <h1></h1> has often much more detailed information about the content of the page.
How can I get the content of (only) the first element of <h1></h1> into a seperate field, called e.g. "alttitle" in order to index it later? I've given up to understand the HtmlParser.java. For the indexing part, I know how to create a plugin (e.g. based on index-more). My big problem is to write a custom parse plugin or to modify the htmlparser. Unfortunately all previous hints in this mailing list that I found assume me being expert in java. I think, I need a little bit more detailled help;-) Thanks for any help, Felix.
