[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913040#action_12913040
]
Doğacan Güney commented on NUTCH-880:
-------------------------------------
+1 from me.
I think we can combine the approach you outlined in NUTCH-907 with this one.
Instead of using confId-s to identify
different confs, we can use different crawl prefixes (or whatever we will call
them) to identify different crawl sets (though
we still need a way to attach different conf-s to different crawl sets).
I think API overall looks good. Maybe we can change all the Map<String,
Object>s to be some classes though.
A minor question:
In JobManager.java:
+ public static enum JobType {INJECT, GENERATE, FETCH, PARSE, UPDATEDB, INDEX,
CRAWL, CLASS};
What is "CLASS" ?
Btw, Andrzej, I will be happy to help out with the implementation if you want.
> REST API (and webapp) for Nutch
> -------------------------------
>
> Key: NUTCH-880
> URL: https://issues.apache.org/jira/browse/NUTCH-880
> Project: Nutch
> Issue Type: New Feature
> Affects Versions: 2.0
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Attachments: API.patch
>
>
> This issue is for discussing a REST-style API for accessing Nutch.
> Here's an initial idea:
> * I propose to use org.restlet for handling requests and returning
> JSON/XML/whatever responses.
> * hook up all regular tools so that they can be driven via this API. This
> would have to be an async API, since all Nutch operations take long time to
> execute. It follows then that we need to be able also to list running
> operations, retrieve their current status, and possibly
> abort/cancel/stop/suspend/resume/...? This also means that we would have to
> potentially create & manage many threads in a servlet - AFAIK this is frowned
> upon by J2EE purists...
> * package this in a webapp (that includes all deps, essentially nutch.job
> content), with the restlet servlet as an entry point.
> Open issues:
> * how to implement the reading of crawl results via this API
> * should we manage only crawls that use a single configuration per webapp, or
> should we have a notion of crawl contexts (sets of crawl configs) with CRUD
> ops on them? this would be nice, because it would allow managing of several
> different crawls, with different configs, in a single webapp - but it
> complicates the implementation a lot.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.