[ https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14899934#comment-14899934 ]
Sebastian Nagel commented on NUTCH-2110: ---------------------------------------- Hi Asitang, the Injector is already able to store key-value pairs from the seed list in CrawlDb withing CrawlDatum's meta data, see [[1|http://nutch.apache.org/apidocs/apidocs-1.10/org/apache/nutch/crawl/Injector.html]]. If the XPath statements are not too complex, this would be the easiest way: the protocol plugin could then read the XPath from the CrawlDatum. Regarding the "state of a selenium operation": should the a state be passed to the outlinks of a page or is the same page fetched multiple times with varying Ajax/JavaScript actions to be performed? > Create the capability to provide seeds in the form of "url+xpath(including > option to enter seach terms).selenium" > ------------------------------------------------------------------------------------------------------------------ > > Key: NUTCH-2110 > URL: https://issues.apache.org/jira/browse/NUTCH-2110 > Project: Nutch > Issue Type: Sub-task > Components: fetcher > Affects Versions: 1.10 > Reporter: Asitang Mishra > Labels: memex > > Create the capability to provide seeds in the form of "url+xpath(including > option to enter seach terms).selenium" to be used by selenium > protocols/plugins as urls/flow to reach to a specific ajax based page or save > the state of a selenium operation for the next fetching round. -- This message was sent by Atlassian JIRA (v6.3.4#6332)