[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339119#comment-14339119 ] Lewis John McGibbney commented on NUTCH-1933: - Hi [~jorgelbg] thanks for noticing this. I did not evidently. bq. I see that is posible to use a phantomjs driver with selenium to provide headless browsing. Is there any way to configure the selenium driver used? Please see NUTCH-1948 > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Mohammad Al-Mohsin > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch, > NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339080#comment-14339080 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1933: --- I see a {{target}} folder in /nutch/trunk/src/plugin/protocol-selenium/src/target/ is this suppose to be there? I see that is posible to use a phantomjs driver with selenium to provide headless browsing. Is there any way to configure the selenium driver used? > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Mohammad Al-Mohsin > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch, > NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338951#comment-14338951 ] Hudson commented on NUTCH-1933: --- SUCCESS: Integrated in Nutch-trunk #2991 (See [https://builds.apache.org/job/Nutch-trunk/2991/]) NUTCH-1933 nutch-selenium plugin (lewismc: http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1662530) * /nutch/trunk/CHANGES.txt * /nutch/trunk/build.xml * /nutch/trunk/ivy/ivy.xml * /nutch/trunk/src/plugin/build.xml * /nutch/trunk/src/plugin/lib-selenium * /nutch/trunk/src/plugin/lib-selenium/build.xml * /nutch/trunk/src/plugin/lib-selenium/ivy.xml * /nutch/trunk/src/plugin/lib-selenium/plugin.xml * /nutch/trunk/src/plugin/lib-selenium/src * /nutch/trunk/src/plugin/lib-selenium/src/java * /nutch/trunk/src/plugin/lib-selenium/src/java/org * /nutch/trunk/src/plugin/lib-selenium/src/java/org/apache * /nutch/trunk/src/plugin/lib-selenium/src/java/org/apache/nutch * /nutch/trunk/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol * /nutch/trunk/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium * /nutch/trunk/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java * /nutch/trunk/src/plugin/protocol-selenium * /nutch/trunk/src/plugin/protocol-selenium/build-ivy.xml * /nutch/trunk/src/plugin/protocol-selenium/build.xml * /nutch/trunk/src/plugin/protocol-selenium/ivy.xml * /nutch/trunk/src/plugin/protocol-selenium/plugin.xml * /nutch/trunk/src/plugin/protocol-selenium/src * /nutch/trunk/src/plugin/protocol-selenium/src/java * /nutch/trunk/src/plugin/protocol-selenium/src/java/org * /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache * /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch * /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol * /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium * /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/Http.java * /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java * /nutch/trunk/src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/package.html * /nutch/trunk/src/plugin/protocol-selenium/src/target * /nutch/trunk/src/plugin/protocol-selenium/src/target/classes * /nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org * /nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache * /nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache/nutch * /nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache/nutch/protocol * /nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache/nutch/protocol/htmlunit * /nutch/trunk/src/plugin/protocol-selenium/src/target/classes/org/apache/nutch/protocol/htmlunit/package.html > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Mohammad Al-Mohsin > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch, > NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337797#comment-14337797 ] Chris A. Mattmann commented on NUTCH-1933: -- My +1 to commit this - Lewis please commit > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch, > NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335644#comment-14335644 ] Lewis John McGibbney commented on NUTCH-1933: - Yes, please read https://github.com/apache/nutch/blob/trunk/src/plugin/parse-tika/howto_upgrade_tika.txt This explains what to do > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch, > NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335641#comment-14335641 ] Mohammad Al-Mohsin commented on NUTCH-1933: --- Thanks for your comments [~lewismc]. I am left with plugin.xml, not sure what to do about it! should I add the libraries that the plugin uses uses? e.g. ... > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch, > NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335538#comment-14335538 ] Lewis John McGibbney commented on NUTCH-1933: - Hi [~almohsin], GREAT :) Further comments * the following code should be moved from $NUTCH_HOME/ivy/ivy.xml, to $NUTCH_HOME/src/plugin/protocol-selenium/ivy.xml {code} + + + + + + + {code} You can consult parse-tika for the implementation, please also see plugin.xml * You can also remove src/plugin/protocol-selenium/src/target/classes/org/apache/nutch/protocol/htmlunit/package.html Once this is done I am +1 for this patch. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch, > NUTCH-selenium-trunk.v2.1.patch, NUTCH-selenium-trunk.v2.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333809#comment-14333809 ] Lewis John McGibbney commented on NUTCH-1933: - Hi [~almohsin] thanks for the patch. Some comments file which you can remove from your patch * src/plugin/lib-selenium/src/pom.xml * src/plugin/protocol-selenium/.idea/.name * src/plugin/protocol-selenium/.idea/compiler.xml * src/plugin/protocol-selenium/.idea/copyright/profiles_settings.xml * src/plugin/protocol-selenium/.idea/encodings.xml * src/plugin/protocol-selenium/.idea/misc.xml * src/plugin/protocol-selenium/.idea/modules.xml * src/plugin/protocol-selenium/.idea/scopes/scope_settings.xml * src/plugin/protocol-selenium/.idea/vcs.xml * src/plugin/protocol-selenium/.idea/workspace.xml * src/plugin/protocol-selenium/src/pom.xml Can you resubmit a patch after updating? Thank you very much > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch, NUTCH-selenium-trunk.v2.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332441#comment-14332441 ] Chris A. Mattmann commented on NUTCH-1933: -- Thank you [~almohsin], I will update the patch according. [~momer] good point let me think about this. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325033#comment-14325033 ] Mo Omer commented on NUTCH-1933: Hey all, Just making a specific call-out to have any Selenium options be configurable. I.e. in both nutch-selenium/selenium-grid, we wait for the Webdriver for 3 seconds; users should be able to tweak this to their needs. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321843#comment-14321843 ] Mohammad Al-Mohsin commented on NUTCH-1933: --- Since Nutch trunk has been updated with tika 1.7, this patch will fail to update ivy/ivy.xml You will have to add this manually under the dependencies: > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320919#comment-14320919 ] Lewis John McGibbney commented on NUTCH-1933: - OK I am +1 on this folks. Anyone else? I can push an RC for 1.10 at the weekend if we can get NUTCH-1928 and NUTCH-1933 into the codebase > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: New Feature > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313439#comment-14313439 ] Chris A. Mattmann commented on NUTCH-1933: -- My +1 to commit this, and then bring the selenium grid plugin in a separate patch. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310804#comment-14310804 ] Chris A. Mattmann commented on NUTCH-1933: -- [~lewismc] I think we should also include the [https://github.com/momer/nutch-selenium-grid-plugin|nutch-selenium-grid] plugin as well. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310803#comment-14310803 ] Chris A. Mattmann commented on NUTCH-1933: -- Thank you Mo for the great work on this and for engaging in the Apache site. You will get full credit for the work you are doing and I totally understand that employers funding things drives the ability to work on open source (or not) sometimes, and also drives how much time you have to do things. It sounds like you are OK with us helping to marshall this into the sources - I sincerely appreciate it. What I appreciate is the amazing work you did on both of these plugins. There are always things to do to improve work - but that's why we have a community and we hope you continue to engage and even if you don't, merit doesn't expire here at Apache, so your merit for doing this stands, whether you write another line of code on it or not. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307539#comment-14307539 ] Mo Omer commented on NUTCH-1933: That's really cool to hear; I'll check out that link Lewis. As my employer no longer has the client for whom the project (a sort of contextual tagging service which derived html content via Nutch) was built, I haven't touched or thought of this in a while. A couple months ago, though, I found myself wondering if there are any better solutions available. Have you all evaluated WebEngine (http://docs.oracle.com/javase/8/javafx/api/javafx/scene/web/WebEngine.html)? Or setting up some sort of dom inside v8 and calling C funcs from Java? One small additional note: the nutch-selenium plugin should also allow the time-delay (basically the time allowed for the page to render - including ajax etc.) to be configured. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307491#comment-14307491 ] Lewis John McGibbney commented on NUTCH-1933: - [~momer], thanks for the feedback bq. is there a "Beginners guide to helping out with Apache projects on Jira?" http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer It would be *fantastic* if you were able to create a patch for us with your [selenium-grid plugin|https://github.com/momer/nutch-selenium-grid-plugin] as well. We are currently evaluating selenium as a mechanism for driving JS interaction prior to fetching the webpage and returning it to the parser. Improving your plugins is where I think we are going. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307481#comment-14307481 ] Mo Omer commented on NUTCH-1933: Right on - glad you all found it useful enough to integrate. As I mentioned on GH, I'd definitely recommend also including the selenium-grid plugin, since it's a wa saner approach to integrating with Selenium. When I cobbled this together, I was under pretty hard deadline pressures, and left a lot of cruft in. All references/files belonging to the old html-unit should be removed, .idea files/directories which I'd missed in my .gitignore should be tossed out; HttpResponse.java should be nearly empty when completed; HttpWebClient should allow the tag which Selenium collects innerHtml for to be configured (right now it's just 'body' with no config options). This, and some Hadoop work a couple weeks after putting this together, was really the first time I'd used Java (outside of JRuby, which, doesn't really count), so I apologize for the wack code smells I left in. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307372#comment-14307372 ] Lewis John McGibbney commented on NUTCH-1933: - Hi Folks, additionally we started a [wiki document|https://wiki.apache.org/nutch/AdvancedAjaxInteraction] which brings some more context to this issue. We will be populating this further as work goes on. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Mo Omer >Assignee: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307332#comment-14307332 ] Chris A. Mattmann commented on NUTCH-1933: -- So we should note that [~momer] I believe was the one who started this plugin. I've been talking with Mo about getting this contributed to Apache: https://github.com/momer/nutch-selenium/commit/029907b45ff65679c41f334f0f3ff16afb7acc07 So, I asked Mo to come over here and take a look at Lewis's patch. Thanks all. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307324#comment-14307324 ] Lewis John McGibbney commented on NUTCH-1933: - arrgh..., it appears to be utter garbage Markus. Sorry about that. > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1933) nutch-selenium plugin
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307136#comment-14307136 ] Markus Jelsma commented on NUTCH-1933: -- Hey, what's this? src/plugin/protocol-selenium/.idea > nutch-selenium plugin > - > > Key: NUTCH-1933 > URL: https://issues.apache.org/jira/browse/NUTCH-1933 > Project: Nutch > Issue Type: Bug > Components: protocol >Reporter: Lewis John McGibbney > Fix For: 1.10 > > Attachments: NUTCH-selenium-trunk.patch > > > I updated the plugin [nutch-selenium|https://github.com/momer/nutch-selenium] > plugin to run against trunk. > I feel that there is a good bit of work to be done here however early testing > on my system are that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)