Hi,
This should be easy, try something like

      if (title.equals("")) {
      Pattern p = Pattern.compile("\\<title\\>.?\\<\\/title\\>");
      Matcher m = p.matcher(text);
      if (m.find()) {
      title = m.group();
      }
      }

after line 194 in HtmlParser.java

Best regards,
Magnus

On Fri, Aug 28, 2009 at 8:07 PM, Alexey Torochkov <all.net...@gmail.com>wrote:

>
> On Fri, Aug 28, 2009 at 7:39 PM, Fuad Efendi <f...@efendi.ca> wrote:
>
>>  Some bad guys even put <div> before <html> tag – check Google cached
>> page J
>>
>> (just joking...)
>>
>> Wonderfully browsers understand that...
>>
> :-P
> Without sarcasm and irony... I just wanted to say that if a page have a
> title - it should be extracted anyway
>
> --
> Alexey Torochkov

Reply via email to