[jira] [Updated] (NUTCH-2191) Add protocol-htmlunit

2016-04-08 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated NUTCH-2191:
-
Labels: memex  (was: )

> Add protocol-htmlunit
> -
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.11
>Reporter: Markus Jelsma
>Assignee: Chris A. Mattmann
>  Labels: memex
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch, NUTCH-2191.patch, NUTCH-2191.patch, 
> NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2191) Add protocol-htmlunit

2016-03-26 Thread Karanjeet Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karanjeet Singh updated NUTCH-2191:
---
Attachment: NUTCH-2191.patch

[~chrismattmann] Thank you.

Patch Updated.

> Add protocol-htmlunit
> -
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.11
>Reporter: Markus Jelsma
>Assignee: Chris A. Mattmann
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch, NUTCH-2191.patch, NUTCH-2191.patch, 
> NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2191) Add protocol-htmlunit

2016-02-19 Thread Karanjeet Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karanjeet Singh updated NUTCH-2191:
---
Attachment: NUTCH-2191.patch

Updated patch with below mentioned plugins:

* *lib-htmlunit* - derived from lib-selenium plugin (Selenium 2.44.0)
* *protocol-htmlunit* - use lib-htmlunit for dependent libraries

> Add protocol-htmlunit
> -
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.11
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch, NUTCH-2191.patch, NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2191) Add protocol-htmlunit

2016-02-17 Thread Karanjeet Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karanjeet Singh updated NUTCH-2191:
---
Attachment: (was: NUTCH-2191.patch)

> Add protocol-htmlunit
> -
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.11
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch, NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2191) Add protocol-htmlunit

2016-02-17 Thread Karanjeet Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karanjeet Singh updated NUTCH-2191:
---
Attachment: NUTCH-2191.patch

> Add protocol-htmlunit
> -
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.11
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch, NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2191) Add protocol-htmlunit

2016-02-17 Thread Karanjeet Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karanjeet Singh updated NUTCH-2191:
---
Attachment: NUTCH-2191.patch

Updated patch to include HtmlUnit from Selenium library. This solves dependency 
problems.

> Add protocol-htmlunit
> -
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.11
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch, NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2191) Add protocol-htmlunit

2016-01-05 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2191:
-
Patch Info: Patch Available

> Add protocol-htmlunit
> -
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.11
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2191) Add protocol-htmlunit

2015-12-24 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2191:
-
Attachment: NUTCH-2191.patch

Patch for trunk! Although all dependencies are correctly listed in plugin.xml, 
Nutch still takes the incorrect versions from its own lib directory. To get 
indexchecker and parsechecker to work you must delete a couple of jars from 
Nutch' lib directory.

I removed all http* and (jetty* or jersey*) jars and got it to work. I haven't 
found a way yet for Nutch to load the dependencies from the plugin itself.

> Add protocol-htmlunit
> -
>
> Key: NUTCH-2191
> URL: https://issues.apache.org/jira/browse/NUTCH-2191
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Affects Versions: 1.11
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.12
>
> Attachments: NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)