Re: protocol-interactiveselenium Custom Handler

2020-06-25 Thread Sebastian Nagel
Hi Craig,

in case, you're building Nutch from the git repo or from the source package
the easiest way is to put the file NewCustomHandler.java into
  src/plugin/protocol-interactiveselenium/src/java/.../handlers/
and run
  ant runtime
to compile and package Nutch including package your custom handler.

Using a jar isn't as simple, mostly because of the classpath encapsulation
of Nutch plugins.

1. add you jar as a dependency to
src/plugin/protocol-interactiveselenium/ivy.xml

2. register the file name of the jar in
src/plugin/protocol-interactiveselenium/plugin.xml
   as


3. build Nutch, see above


Of course, ivy must be able to pick the jar from one of
the repositories listed in
  ivy/ivysettings.xml

But it's possible to add your local Maven repo/cache by adding:

  
...

  ...
  


  

> An example of a custom handler that someone has written would be great.

There are some handler implementations in
  src/plugin/protocol-interactiveselenium/src/java/.../handlers/
I've never made use of them, but they look "custom", at least,
at the first glance, because one file name includes a typo. :)
If you have time please open a Jira issue at
  https://issues.apache.org/jira/projects/NUTCH
to fix the naming.

Thanks,
Sebastian


On 6/25/20 1:18 AM, Craig Tataryn wrote:
> Hello, I would like to create my own Custom Handler for
> protocol-interactiveselenium.
> 
> In reading the code [1] I see that when setting the config:
> 
> 
>   interactiveselenium.handlers
>   NewCustomHandler,DefaultHandler
>   
> 
> 
> the "NewCustomerHandler" would be loaded from the classpath assuming it was
> called: 
> org.apache.nutch.protocol.interactiveselenium.handlers.NewCustomerHandler.
> However, my question is: how do I get Nutch to incorporate my new .jar file
> containing the NewCustomerHandler?
> 
> I've written protocol and indexer plugins before, however this seems a bit
> different. An example of a custom handler that someone has written would be
> great.
> 
> Thanks,
> 
> Craig.
> 
> [1] -
> https://github.com/apache/nutch/blob/ea862f45b83177b41aebad9c18b900936d43a19a/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java#L364
> 



protocol-interactiveselenium Custom Handler

2020-06-24 Thread Craig Tataryn
Hello, I would like to create my own Custom Handler for
protocol-interactiveselenium.

In reading the code [1] I see that when setting the config:


  interactiveselenium.handlers
  NewCustomHandler,DefaultHandler
  


the "NewCustomerHandler" would be loaded from the classpath assuming it was
called: 
org.apache.nutch.protocol.interactiveselenium.handlers.NewCustomerHandler.
However, my question is: how do I get Nutch to incorporate my new .jar file
containing the NewCustomerHandler?

I've written protocol and indexer plugins before, however this seems a bit
different. An example of a custom handler that someone has written would be
great.

Thanks,

Craig.

[1] -
https://github.com/apache/nutch/blob/ea862f45b83177b41aebad9c18b900936d43a19a/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java#L364