[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339119#comment-14339119
 ] 

Lewis John McGibbney commented on NUTCH-1933:
-

Hi [~jorgelbg] thanks for noticing this. I did not evidently.
bq. I see that is posible to use a phantomjs driver with selenium to provide 
headless browsing. Is there any way to configure the selenium driver used?
Please see NUTCH-1948

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Mohammad Al-Mohsin
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch, 
> NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-26 Thread Jorge Luis Betancourt Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339080#comment-14339080
 ] 

Jorge Luis Betancourt Gonzalez commented on NUTCH-1933:
---

I see a {{target}} folder in 
/nutch/trunk/src/plugin/protocol-selenium/src/target/ is this suppose to be 
there? I see that is posible to use a phantomjs driver with selenium to provide 
headless browsing. Is there any way to configure the selenium driver used?

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Mohammad Al-Mohsin
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch, 
> NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338951#comment-14338951
 ] 

Hudson commented on NUTCH-1933:
---

SUCCESS: Integrated in Nutch-trunk #2991 (See 
[https://builds.apache.org/job/Nutch-trunk/2991/])
NUTCH-1933 nutch-selenium plugin (lewismc: 
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1662530)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/build.xml
* /nutch/trunk/ivy/ivy.xml
* /nutch/trunk/src/plugin/build.xml
* /nutch/trunk/src/plugin/lib-selenium
* /nutch/trunk/src/plugin/lib-selenium/build.xml
* /nutch/trunk/src/plugin/lib-selenium/ivy.xml
* /nutch/trunk/src/plugin/lib-selenium/plugin.xml
* /nutch/trunk/src/plugin/lib-selenium/src
* /nutch/trunk/src/plugin/lib-selenium/src/java
* /nutch/trunk/src/plugin/lib-selenium/src/java/org
* /nutch/trunk/src/plugin/lib-selenium/src/java/org/apache
* /nutch/trunk/src/plugin/lib-selenium/src/java/org/apache/nutch
* /nutch/trunk/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol
* 
/nutch/trunk/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium
* 
/nutch/trunk/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java
* /nutch/trunk/src/plugin/protocol-selenium
* /nutch/trunk/src/plugin/protocol-selenium/build-ivy.xml
* /nutch/trunk/src/plugin/protocol-selenium/build.xml
* /nutch/trunk/src/plugin/protocol-selenium/ivy.xml
* /nutch/trunk/src/plugin/protocol-selenium/plugin.xml
* /nutch/trunk/src/plugin/protocol-selenium/src
* /nutch/trunk/src/plugin/protocol-selenium/src/java
* /nutch/trunk/src/plugin/protocol-selenium/src/java/org
* /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache
* /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch
* /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol
* 
/nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium
* 
/nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java
* 
/nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java
* 
/nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/package.html
* /nutch/trunk/src/plugin/protocol-selenium/src/target
* /nutch/trunk/src/plugin/protocol-selenium/src/target/classes
* /nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org
* /nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache
* /nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache/nutch
* 
/nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache/nutch/protocol
* 
/nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache/nutch/protocol/htmlunit
* 
/nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache/nutch/protocol/htmlunit/package.html


> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Mohammad Al-Mohsin
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch, 
> NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-25 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337797#comment-14337797
 ] 

Chris A. Mattmann commented on NUTCH-1933:
--

My +1 to commit this - Lewis please commit

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch, 
> NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-24 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335644#comment-14335644
 ] 

Lewis John McGibbney commented on NUTCH-1933:
-

Yes, please read
https://github.com/apache/nutch/blob/trunk/src/plugin/parse-tika/howto_upgrade_tika.txt
This explains what to do

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch, 
> NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-24 Thread Mohammad Al-Mohsin (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335641#comment-14335641
 ] 

Mohammad Al-Mohsin commented on NUTCH-1933:
---

Thanks for your comments [~lewismc].

I am left with plugin.xml, not sure what to do about it! should I add the 
libraries that the plugin uses uses? e.g.

...


> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch, 
> NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-24 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335538#comment-14335538
 ] 

Lewis John McGibbney commented on NUTCH-1933:
-

Hi [~almohsin], GREAT :)
Further comments
 * the following code should be moved from $NUTCH_HOME/ivy/ivy.xml, to 
$NUTCH_HOME/src/plugin/protocol-selenium/ivy.xml
{code}
+   
+   
+   
+   
+   
+   
+   
{code}
You can consult parse-tika for the implementation, please also see plugin.xml
 * You can also remove 
src/plugin/protocol-selenium/src/target/classes/org/apache/nutch/protocol/htmlunit/package.html

Once this is done I am +1 for this patch.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch, 
> NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-23 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333809#comment-14333809
 ] 

Lewis John McGibbney commented on NUTCH-1933:
-

Hi [~almohsin] thanks for the patch. Some comments
file which you can remove from your patch
 * src/plugin/lib-selenium/src/pom.xml
 * src/plugin/protocol-selenium/.idea/.name
 * src/plugin/protocol-selenium/.idea/compiler.xml
 * src/plugin/protocol-selenium/.idea/copyright/profiles_settings.xml
 * src/plugin/protocol-selenium/.idea/encodings.xml
 * src/plugin/protocol-selenium/.idea/misc.xml
 * src/plugin/protocol-selenium/.idea/modules.xml
 * src/plugin/protocol-selenium/.idea/scopes/scope_settings.xml
 * src/plugin/protocol-selenium/.idea/vcs.xml
 * src/plugin/protocol-selenium/.idea/workspace.xml
 * src/plugin/protocol-selenium/src/pom.xml

Can you resubmit a patch after updating? Thank you very much

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch, NUTCH-selenium-trunk.v2.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-22 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332441#comment-14332441
 ] 

Chris A. Mattmann commented on NUTCH-1933:
--

Thank you [~almohsin], I will update the patch according. [~momer] good point 
let me think about this.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-17 Thread Mo Omer (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325033#comment-14325033
 ] 

Mo Omer commented on NUTCH-1933:


Hey all,

Just making a specific call-out to have any Selenium options be configurable. 
I.e. in both nutch-selenium/selenium-grid, we wait for the Webdriver for 3 
seconds; users should be able to tweak this to their needs.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-15 Thread Mohammad Al-Mohsin (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321843#comment-14321843
 ] 

Mohammad Al-Mohsin commented on NUTCH-1933:
---

Since Nutch trunk has been updated with tika 1.7, this patch will fail to 
update ivy/ivy.xml

You will have to add this manually under the dependencies:











> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-13 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320919#comment-14320919
 ] 

Lewis John McGibbney commented on NUTCH-1933:
-

OK I am +1 on this folks. Anyone else? I can push an RC for 1.10 at the weekend 
if we can get NUTCH-1928 and NUTCH-1933 into the codebase

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: New Feature
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-09 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313439#comment-14313439
 ] 

Chris A. Mattmann commented on NUTCH-1933:
--

My +1 to commit this, and then bring the selenium grid plugin in a separate 
patch.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-07 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310804#comment-14310804
 ] 

Chris A. Mattmann commented on NUTCH-1933:
--

[~lewismc] I think we should also include the 
[https://github.com/momer/nutch-selenium-grid-plugin|nutch-selenium-grid] 
plugin as well.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-07 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310803#comment-14310803
 ] 

Chris A. Mattmann commented on NUTCH-1933:
--

Thank you Mo for the great work on this and for engaging in the Apache site. 
You will get full credit for the work you are doing and I totally understand 
that employers funding things drives the ability to work on open source (or 
not) sometimes, and also drives how much time you have to do things. It sounds 
like you are OK with us helping to marshall this into the sources - I sincerely 
appreciate it. What I appreciate is the amazing work you did on both of these 
plugins. There are always things to do to improve work - but that's why we have 
a community and we hope you continue to engage and even if you don't, merit 
doesn't expire here at Apache, so your merit for doing this stands, whether you 
write another line of code on it or not. 

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-05 Thread Mo Omer (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307539#comment-14307539
 ] 

Mo Omer commented on NUTCH-1933:


That's really cool to hear; I'll check out that link Lewis. As my employer no 
longer has the client for whom the project (a sort of contextual tagging 
service which derived html content via Nutch) was built, I haven't touched or 
thought of this in a while. A couple months ago, though, I found myself 
wondering if there are any better solutions available. 

Have you all evaluated WebEngine 
(http://docs.oracle.com/javase/8/javafx/api/javafx/scene/web/WebEngine.html)? 
Or setting up some sort of dom inside v8 and calling C funcs from Java?

One small additional note: the nutch-selenium plugin should also allow the 
time-delay (basically the time allowed for the page to render - including ajax 
etc.) to be configured.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-05 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307491#comment-14307491
 ] 

Lewis John McGibbney commented on NUTCH-1933:
-

[~momer], thanks for the feedback
bq. is there a "Beginners guide to helping out with Apache projects on Jira?"
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer
It would be *fantastic* if you were able to create a patch for us with your 
[selenium-grid plugin|https://github.com/momer/nutch-selenium-grid-plugin] as 
well. We are currently evaluating selenium as a mechanism for driving JS 
interaction prior to fetching the webpage and returning it to the parser. 
Improving your plugins is where I think we are going.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-05 Thread Mo Omer (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307481#comment-14307481
 ] 

Mo Omer commented on NUTCH-1933:


Right on - glad you all found it useful enough to integrate. As I mentioned on 
GH, I'd definitely recommend also including the selenium-grid plugin, since 
it's a wa saner approach to integrating with Selenium.

When I cobbled this together, I was under pretty hard deadline pressures, and 
left a lot of cruft in. All references/files belonging to the old html-unit 
should be removed, .idea files/directories which I'd missed in my .gitignore 
should be tossed out; HttpResponse.java should be nearly empty when completed; 
HttpWebClient should allow the tag which Selenium collects innerHtml for to be 
configured (right now it's just 'body' with no config options).

This, and some Hadoop work a couple weeks after putting this together, was 
really the first time I'd used Java (outside of JRuby, which, doesn't 
really count), so I apologize for the wack code smells I left in.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-05 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307372#comment-14307372
 ] 

Lewis John McGibbney commented on NUTCH-1933:
-

Hi Folks, additionally we started a [wiki 
document|https://wiki.apache.org/nutch/AdvancedAjaxInteraction] which brings 
some more context to this issue. We will be populating this further as work 
goes on.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Mo Omer
>Assignee: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-05 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307332#comment-14307332
 ] 

Chris A. Mattmann commented on NUTCH-1933:
--

So we should note that [~momer] I believe was the one who started this plugin. 
I've been talking with Mo about getting this contributed to Apache: 

https://github.com/momer/nutch-selenium/commit/029907b45ff65679c41f334f0f3ff16afb7acc07

So, I asked Mo to come over here and take a look at Lewis's patch. Thanks all.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-05 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307324#comment-14307324
 ] 

Lewis John McGibbney commented on NUTCH-1933:
-

arrgh..., it appears to be utter garbage Markus. Sorry about that.

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-05 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307136#comment-14307136
 ] 

Markus Jelsma commented on NUTCH-1933:
--

Hey, what's this? 
src/plugin/protocol-selenium/.idea

> nutch-selenium plugin
> -
>
> Key: NUTCH-1933
> URL: https://issues.apache.org/jira/browse/NUTCH-1933
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Reporter: Lewis John McGibbney
> Fix For: 1.10
>
> Attachments: NUTCH-selenium-trunk.patch
>
>
> I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] 
> plugin to run against trunk.
> I feel that there is a good bit of work to be done here however early testing 
> on my system are that it works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)