[
https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714510#comment-16714510
]
Sebastian Nagel commented on NUTCH-2676:
----------------------------------------
[~virt], thanks for the update. There is already an option to [white list
hosts|https://wiki.apache.org/nutch/WhiteListRobots/] (NUTCH-1927). After a
longer discussion we agreed on this - it makes it easy to ignore the robots.txt
for a list of hosts you're allowed to but still would require a change in the
source code if anybody wants to generally ignore the robots.txt standard. It's
implemented in lib-http and should be available for protocol-selenium as well
(but I never tested it here).
> Update to the latest selenium and add code to use chrome and firefox headless
> mode with the remote web driver
> -------------------------------------------------------------------------------------------------------------
>
> Key: NUTCH-2676
> URL: https://issues.apache.org/jira/browse/NUTCH-2676
> Project: Nutch
> Issue Type: New Feature
> Components: protocol
> Affects Versions: 1.15
> Reporter: Stas Batururimi
> Priority: Major
> Fix For: 1.16
>
> Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
> * missing remote web driver for chrome
> * necessity to add headless mode for both remote WebDriverBase Firefox &
> Chrome
> * use case with Selenium grid using docker (1 hub docker container, several
> nodes in different docker containers, Nutch in another docker container,
> streaming to Apache Solr in docker container, that is at least 4 different
> docker containers)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)