Hi Aamir, On Tue, Apr 3, 2012 at 12:05 PM, Aamir Khan <syst3m.w...@gmail.com> wrote:
> > Exactly, I will have full summer to understand and get up to speed. But > since my knowledge is very limited my proposal won't be too good.. :) > >> >> This doesn't need to be the case. In fact it is crucial that the submission is of a reasonable quality. The original issue was pretty well discussed iirc, and additionally there is also some code uploaded by the original author so you could have a look at that over the next few days before making a crack at the submission. I can say one thing for sure though, this issue might need to be branded more generically... just now Nutch would benefit more from a generically oriented plugin for scraping various parts of html. The original author had a use case driven approach to this issue which meant he had to extract very specific content from news sites... this may not suit you, and certainly isn't absolutely everyone's cup of tea within the community. It would be great if you could discuss both in your application and on the Jira thread how the issue could be opened up, subsequently enabling more Nutch users to benefit... as you are stepping up to apply here, how you wish to do this is entirely your own choice so I would take the positives from the flexibility you have here and focus on them within your submission. Does this sounds reasonable? I look forward to seeing any progress you have and will seriously consider stepping up to be a potential mentor as it was me that added the issue to GSoC list of projects. Thank you Lewis