[Nutch Wiki] Update of "Nutch_1.X_RESTAPI" by SujenShah

Apache Wiki Sat, 21 Feb 2015 20:09:50 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "Nutch_1.X_RESTAPI" page has been changed by SujenShah:
https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI

New page:
= Nutch 1.x REST API =

<<TableOfContents(4)>>

== Introduction ==
This page documents the Nutch 1.X REST API. 

It provides details on the type of REST calls which can be made to the Nutch 
1.x REST API. Many of the API points are adapted from the ones provided by the  
[[https://wiki.apache.org/nutch/NutchRESTAPI|Nutch 2.x REST API]]. One of the 
reasons to come up with a REST API is to integrate D3 to show visualizations 
about the working of a Nutch crawl. 


== REST API Calls ==
=== Administration ===
This API point is created in order to get server status and manage server's 
state.
==== Get server status ====
{{{{
GET /admin
}}}}

__Response__ contains server startup date, availible configuration names, job 
history and currently running jobs.
{{{{
{
   "startDate":1424572500000,
   "configuration":[
      "default"
   ],
   "jobs":[

   ],
   "runningJobs":[

   ]
}
}}}}

==== Stop server ====
It is possible to stop running server using ''/admin/stop''.
{{{{
GET /admin/stop
}}}}

__Response__
{{{{
Stopping in 5 seconds.
}}}}

=== Jobs ===
This point allows job management, including creation, job information and 
killing of a job.
==== Listing all jobs ====
{{{{
GET /job
}}}}

__Response__ contains list of all jobs (running and history)
{{{{
[
   {
      "id":"job-id-5977",
      "type":"FETCH",
      "confId":"default",
      "args":null,
      "result":null,
      "state":"FINISHED",
      "msg":"",
      "crawlId":"crawl-01"
   }
   {
      "id":"job-id-5978",
      "type":"PARSE",
      "confId":"default",
      "args":null,
      "result":null,
      "state":"RUNNING",
      "msg":"",
      "crawlId":"crawl-01"
   }
]
}}}}

==== Get job info ====
{{{{
GET /job/job-id-5977
}}}}

__Response__
{{{{
   {
      "id":"job-id-5977",
      "type":"FETCH",
      "confId":"default",
      "args":null,
      "result":null,
      "state":"FINISHED",
      "msg":"",
      "crawlId":"crawl-01"
   }
}}}}

==== Stop job ====
{{{{
GET /job/job-id-5977/stop
}}}}

__Response__
{{{{
  true
}}}}


==== Kill job ====
{{{{
GET /job/job-id-5977/abort
}}}}

__Response__
{{{{
  true
}}}}

==== Create job ====
Create job with given parameters. You should either specify Job Type(like 
INJECT, GENERATE, FETCH, PARSE, etc ) or jobClassName.
{{{{
POST /job/create
   {
      "crawlId":"crawl-01",
      "type":"FETCH",
      "confId":"default",
      "args":{"someParam":"someValue"}
   }

POST /job/create
   {
      "crawlId":"crawl-01",
      "jobClassName":"org.apache.nutch.fetcher.FetcherJob"
      "confId":"default",
      "args":{"someParam":"someValue"}
   }
}}}}

__Response__ is created job's id.
{{{{
    job-id-43243
}}}}

=== URL ===

This point is created in order to get the required information about a URL or 
list of URLs to generate a D3 visualization. The information obtained from this 
API point will help 
{{{{
GET /url/{filtered-url}
}}}}
__Response__ contains information about the url from the CrawlDbReader.java 
class. The parameters are
{{{{
   {
      "url" : "",
      "statusCode" : "",
      "fetchTime" : "",
      "score" : "",
      "numOfInlinks" : "",
      "numOfOutlinks" : "",
   }
}}}}

== More ==
Description of more API points coming soon.

[Nutch Wiki] Update of "Nutch_1.X_RESTAPI" by SujenShah

Reply via email to