Re: [jira] Updated: (NUTCH-251) Administration GUI

TDLN Wed, 26 Apr 2006 03:35:47 -0700

Stefan,

this patch looks very interesting. I would like to test it, but before
I have several questions.


1) I am using local filesystem - as you say that local running jobs
can not be stopped, does that imply that the scheduling is
dysfunctional as well on local filesystem

2) Do you think it makes sense to have a language bundle in Dutch -
personally I don't, because I never meet a Dutch developer who doesn't
speak English, but I could do it anyway.

3) What you like to operate as a filter for the bugs before we add
them to Jira, so that we don't post any known issues / duplicates.

Thanks for all the work.

Rgrds, Thomas Delnoij

On 4/22/06, Stefan Groschupf (JIRA) <[EMAIL PROTECTED]> wrote:
>      [ http://issues.apache.org/jira/browse/NUTCH-251?page=all ]
>
> Stefan Groschupf updated NUTCH-251:
> -----------------------------------
>
>     Attachment: hadoop_nutch_gui_v1.patch
>                 nutch_gui_v1.patch
>                 nutch_gui_plugins_v1.zip
>
> This is a early preview patch of the nutch gui.
> There are known issues, however it is a starting point from where we can 
> continue building a solid administration user interface.
>
>
> This patch introduce following functionalities:
>
> + web based administration gui via embed web container
> + gui is fully based  on the plugin system, so it is customizable and  
> extendable using plugins
> + all plugins can be internationalized
> + introduce the concept of nutch instances, a mechanism to have separated 
> configurable nutch deployments using the same code base. (e.g intranet 
> search, webpage search)
> + plug able authentication, currently it comes with a default user  - 
> password tuple based on the configuration but for example LDAP integration 
> can be easily realized.
>
> The patch it comes with following plugins:
> + admin-listing
> ++ required by the web ui to show all deployed plugins as tabs on a webpage
>
> + admin-instance
> ++ lists all instances and allows to create a new instance
>
> + admin-configuration
> ++ configure a nutch instance (configuration will be written as 
> nutch-site.xml to hdd)
>
> + admin-inject
> ++ inject urls in a crawlDb
>
> +admin-system
> ++ shows status of system
>
> +admin-job
> ++ shows  status of jobs
>
> + admin-crawldb-status
> ++ shows crawldb entries filtered by status or  shows the status of a given 
> url  (usefully to check if a page was already fetched)
>
> +admin-management
> ++ generate segment
> ++ fetch segment
> ++ parse segment (if required)
> ++ update crawldb
> ++ invert links
> ++ index segment
> ++ delete segment, parse, index etc.
>
> +admin-scheduling
> ++ quartz based cron job management to run a time driven "generate - fetch - 
> updatedb - invertlins - index" job
>
>
> Known issues
> + require hadoop changes
> + local running jobs can not be stopped but distributed running jobs can be 
> stopped
> + index searcher does not use index folders inside of segment folders as in 
> nutch 0.7 but the gui place the index folder in the segment folder
> ++ searcher is unable to find indices
> + put to search does not work since searcher does not support dynamically 
> adding of index folders
> + linkdb inverter does not update but overwrite a linkdb - this is a general 
> nutch bug but affect the gui as well.
> + the nutch gui introduce locking by storing lock files in folders, this 
> mechanism is ignored by the nutch command line tools.
>
>
>
> It would be great if users can test the gui and reports bugs and help to 
> improve the patch.
> This is a very complex patch and it is difficult to stay in sync with the 
> latest changes so in case we miss something
> until generation this patch and the patch does not work as expected please 
> don't blame us but give us some time and hints to fix the problems.
>
>
>  help is welcome by following tasks:
> + fixing languages issues in java doc, api and bundle files
> + translate bundles in more languages (currently it comes with english and 
> german bundles)
> + heavily test and find bugs and provide fixes :)
> + write help texts and documentation
>
> How to:
>
> + checkout latest nutch sources
>
> + checkout hadoop sources
> + patch hadoop with the hadoop patch
> + build hadoop jar
> + remove old hadoop jar from nutch/lib
> + place new hadoop jar in nutch/lib
>
>
> + uncompress plugin zip file
> + place plugins in nutch/src/plugins (patch not possible since svn does not 
> support binary patches)
> + patch nutch with nutch patch
> + start gui with bin/nutch gui <folderWhereYourInstanceDataWillBeStored)
> + point your browser to: http://localhost:50060/general/
> + username and password are "admin". ( can be changed in nutch-default.xml)
> + select the "default" instance or create a new instance.
>
>
>
> Thanks to everybody that helped to get this implement and do the first beta 
> tests, but specially to Marko hacking all jsp's!
> I suggest to add this patch to a nutch 0.9 branch and add a gui component in 
> the jira to go from there.
> I really hope I didn't miss anything or upload the wrong files now. :-O
>
> > Administration GUI
> > ------------------
> >
> >          Key: NUTCH-251
> >          URL: http://issues.apache.org/jira/browse/NUTCH-251
> >      Project: Nutch
> >         Type: Improvement
>
> >     Versions: 0.8-dev
> >     Reporter: Stefan Groschupf
> >     Priority: Minor
> >      Fix For: 0.8-dev
> >  Attachments: hadoop_nutch_gui_v1.patch, nutch_gui_plugins_v1.zip, 
> > nutch_gui_v1.patch
> >
> > Having a web based administration interface would help to make nutch 
> > administration and management much more user friendly.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>
>

Re: [jira] Updated: (NUTCH-251) Administration GUI

Reply via email to