Hi all, for NUTCH-251:
I suppose that NUTCH-251 is relatively a significant issue by the votes.
Stafan has written a good plugin for the admin gui and i have updated it
to work with nutch-0.8, hadoop 0.4.
Some of the features in the patch is not appropriate for our use cases
and it requires hadoop changes, thus I am currently working on an
alternative implementation of the administration gui, which runs a
hadoop server( like JobTraker) to listen to submitted Jobs, an web Gui
to submit and track the jobs from the browser and a job runner.
The architechture details of the patch is as follows :
- An interface AdminJob which is an abstract class representing a Job
in nutch.
- various classes extending AdminJob. for ex FetchAdminJob, IndexAdminJob.
- A queue which sorts the jobs in priority order, by a modified a
topological sort(jobs can be dependent).
- an interface to submit Jobs
- a rpc server to listen to job submissions
- an extension point (basically same as the previous)
- a web server to serve plugin jsp's
upon the features will be
- submitting jobs from code, command line or web interface,
- tracking jobs from the command line or web interface
- scheduling jobs
I could send the code or details if anyone is interested in pretesting.
And i will appreciate any comments and suggestions on this. I am
planning to complete the patch and submit it to Jira ASAP.
Sami Siren wrote:
> Hello,
>
> It has been a while from a previous release (0.8.1) and looking at the
> great fixes done in trunk I'd start thinking about baking a new release
> soon.
>
> Looking at the jira roadmaps there are 1 blocking issues (fixing the
> license headers) for 0.8.2 and two other blocking issues for 0.9.0 of
> which I think NUTCH-233 is safe to put in.
>
> The top 10 voted issues are currently:
>
> NUTCH-61 Adaptive re-fetch interval. Detecting umodified content
> NUTCH-48 "Did you mean" query enhancement/refignment feature
> NUTCH-251 Administration GUI
> NUTCH-289 CrawlDatum should store IP address
> NUTCH-36 Chinese in Nutch
> NUTCH-185 XMLParser is configurable xml parser plugin.
> NUTCH-59 meta
> data support in webdb
> NUTCH-92 DistributedSearch incorrectly scores results
> NUTCH-68 A
> tool to generate arbitrary fetchlists NUTCH-87
> Efficient
> site-specific crawling for a large number of sites
>
> Are there any opinions about issues that should go in before the next
> release (Answering yes means that you are willing to provide a patch for
> it).
>
> --
> Sami Siren
>
>
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers