2014-05-03 20:04 GMT+03:00 Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>:
> Hi Talat,
> On Sat, May 3, 2014 at 4:35 AM, <dev-digest-h...@nutch.apache.org> wrote:
>> Now used parser plugins nekohtml doesnt parse correctly.
> What is wrong with it? Are there any issues in Jira to back this up?
>> When I tested
>> in huge website site, it leaves html tags.
> Pretty vague. Anything else? Any more details? Can this be implemented in
> existing parser plugins?
>> IMHO our parser is little
>> bit old.
> Which one? Is it possible to upgrade? I don't know which parser you mean.
>> After doing some research, I found Jsoup[1] and Gumbo[2]
>> parser.  I did some test on broken websites. I saw gumbo and jsoup
>> parsed very similar Google's parser.
> So what are the benefits? If we have a clear cut argument then lets go for
> it. If not then maybe your time would be better invested elsewhere. It's up
> to you I suppose :)

Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Reply via email to