[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2132: -- Attachment: NUTCH-2132.v2.patch Hi Folks, Just found a few more cases where exceptions are thrown, one of them is : sometimes when a exchange server with same name but a different exchange type is used error is thrown (when the RMQ server is not restarted). Or if the RMQ server goes down after successful initialization. I am uploading a new patch(NUTCH-2132.v2.patch) in which I have tried to catch all exceptions instead of specific ones like in the earlier patch. > Publisher/Subscriber model for Nutch to emit events > > > Key: NUTCH-2132 > URL: https://issues.apache.org/jira/browse/NUTCH-2132 > Project: Nutch > Issue Type: New Feature > Components: fetcher, REST_api >Reporter: Sujen Shah > Labels: memex > Fix For: 1.12 > > Attachments: NUTCH-2132.patch, NUTCH-2132.v2.patch, > PubSub_routingkey.patch > > > It would be nice to have a Pub/Sub model in Nutch to emit certain events (ex- > Fetcher events like fetch-start, fetch-end, a fetch report which may contain > data like outlinks of the current fetched url, score, etc). > A consumer of this functionality could use this data to generate real time > visualization and generate statics of the crawl without having to wait for > the fetch round to finish. > The REST API could contain an endpoint which would respond with a url to > which a client could subscribe to get the fetcher events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2132: -- Attachment: PubSub_routingkey.patch Patch to route different crawls with different routingkeys set in the configuration. This patch includes: 1. Property fetcher.publisher to choose whether to use a publisher or no 2. Collapsed event creation into one method. > Publisher/Subscriber model for Nutch to emit events > > > Key: NUTCH-2132 > URL: https://issues.apache.org/jira/browse/NUTCH-2132 > Project: Nutch > Issue Type: New Feature > Components: fetcher, REST_api >Reporter: Sujen Shah > Labels: memex > Fix For: 1.12 > > Attachments: NUTCH-2132.patch, PubSub_routingkey.patch > > > It would be nice to have a Pub/Sub model in Nutch to emit certain events (ex- > Fetcher events like fetch-start, fetch-end, a fetch report which may contain > data like outlinks of the current fetched url, score, etc). > A consumer of this functionality could use this data to generate real time > visualization and generate statics of the crawl without having to wait for > the fetch round to finish. > The REST API could contain an endpoint which would respond with a url to > which a client could subscribe to get the fetcher events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2132: - Fix Version/s: (was: 1.11) 1.12 > Publisher/Subscriber model for Nutch to emit events > > > Key: NUTCH-2132 > URL: https://issues.apache.org/jira/browse/NUTCH-2132 > Project: Nutch > Issue Type: New Feature > Components: fetcher, REST_api >Reporter: Sujen Shah > Labels: memex > Fix For: 1.12 > > Attachments: NUTCH-2132.patch > > > It would be nice to have a Pub/Sub model in Nutch to emit certain events (ex- > Fetcher events like fetch-start, fetch-end, a fetch report which may contain > data like outlinks of the current fetched url, score, etc). > A consumer of this functionality could use this data to generate real time > visualization and generate statics of the crawl without having to wait for > the fetch round to finish. > The REST API could contain an endpoint which would respond with a url to > which a client could subscribe to get the fetcher events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2132: -- Description: It would be nice to have a Pub/Sub model in Nutch to emit certain events (ex- Fetcher events like fetch-start, fetch-end, a fetch report which may contain data like outlinks of the current fetched url, score, etc). A consumer of this functionality could use this data to generate real time visualization and generate statics of the crawl without having to wait for the fetch round to finish. The REST API could contain an endpoint which would respond with a url to which a client could subscribe to get the fetcher events. was: It would be nice to have a Pub/Sub model in Nutch to emit certain events (ex- Fetcher events). A consumer of this functionality could use this data to generate real time visualization and generate statics of the crawl without having to wait for the fetch round to finish. The REST API could contain an endpoint which would respond with a url to which a client could subscribe to get the fetcher events. > Publisher/Subscriber model for Nutch to emit events > > > Key: NUTCH-2132 > URL: https://issues.apache.org/jira/browse/NUTCH-2132 > Project: Nutch > Issue Type: New Feature > Components: fetcher, REST_api >Reporter: Sujen Shah > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-2132.patch > > > It would be nice to have a Pub/Sub model in Nutch to emit certain events (ex- > Fetcher events like fetch-start, fetch-end, a fetch report which may contain > data like outlinks of the current fetched url, score, etc). > A consumer of this functionality could use this data to generate real time > visualization and generate statics of the crawl without having to wait for > the fetch round to finish. > The REST API could contain an endpoint which would respond with a url to > which a client could subscribe to get the fetcher events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2132: -- Attachment: NUTCH-2132.patch Attaching a patch which describes my idea for a Pub/Sub model. This contains the events being published and an interface which could have multiple implementations of a queueing mechanism. I have currently implemented it using RabbitMQ. Feedback welcomed :) > Publisher/Subscriber model for Nutch to emit events > > > Key: NUTCH-2132 > URL: https://issues.apache.org/jira/browse/NUTCH-2132 > Project: Nutch > Issue Type: New Feature > Components: fetcher, REST_api >Reporter: Sujen Shah > Labels: memex > Fix For: 1.11 > > Attachments: NUTCH-2132.patch > > > It would be nice to have a Pub/Sub model in Nutch to emit certain events (ex- > Fetcher events). > A consumer of this functionality could use this data to generate real time > visualization and generate statics of the crawl without having to wait for > the fetch round to finish. > The REST API could contain an endpoint which would respond with a url to > which a client could subscribe to get the fetcher events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)