Update:

 selenium latest version 2.44.0 doesn’t seem to work with firefox latest 
version(35),so I installed firefox version 29 and it’s crawling properly now.
> On Feb 18, 2015, at 2:56 PM, Jaydeep Bagrecha <bagre...@usc.edu> wrote:
> 
> thanks Jiaxin!
> 
> I again repeated the entire installation procedure and I think i have 
> installed it correctly.(it said BUILD SUCCESSFUL after ant runtime command 
> and has selenium jar files in runtime/local/lib folder)
> 
> When i started crawling the mozilla browser popped 2 times,but when i saw 
> crawl statistics,it had fetched no urls(Did anyone have this problem?)
> 
> I had following error while crawling:-
> 
> org.openqa.selenium.firefox.NotConnectedException: Unable to connect to host 
> 127.0.0.1 on port 7055 after 45000 ms. Firefox console output:
> h changes to installed add-ons
> 1424295898279 addons.xpi-utils        DEBUG   Updating add-on states
> 1424295898281 addons.xpi-utils        DEBUG   Writing add-ons list
> 1424295898291 addons.manager  DEBUG   Registering shutdown blocker for 
> XPIProvider
> 1424295898292 addons.manager  DEBUG   Registering shutdown blocker for 
> LightweightThemeManager
> 1424295898295 addons.manager  DEBUG   Registering shutdown blocker for 
> OpenH264Provider
> 1424295898296 addons.manager  DEBUG   Registering shutdown blocker for 
> PluginProvider
> 1424295898775 DeferredSave.extensions.json    DEBUG   Starting timer
> 1424295898800 DeferredSave.extensions.json    DEBUG   Starting write
> 1424295898858 addons.manager  DEBUG   shutdown
> 1424295898859 addons.manager  DEBUG   Calling shutdown blocker for XPIProvider
> 1424295898859 addons.xpi      DEBUG   shutdown
> 1424295898860 addons.xpi-utils        DEBUG   shutdown
> 1424295898861 addons.manager  DEBUG   Calling shutdown blocker for 
> LightweightThemeManager
> 1424295898862 addons.manager  DEBUG   Calling shutdown blocker for 
> OpenH264Provider
> 1424295898864 addons.manager  DEBUG   Calling shutdown blocker for 
> PluginProvider
> 1424295899016 DeferredSave.extensions.json    DEBUG   Write succeeded
> 1424295899016 addons.xpi-utils        DEBUG   XPI Database saved, setting 
> schema version preference to 16
> 1424295899017 addons.xpi      DEBUG   Notifying XPI shutdown observers
> 1424295899025 addons.manager  DEBUG   Async provider shutdown done
> 1424295900455 addons.manager  DEBUG   Loaded provider scope for 
> resource://gre/modules/addons/XPIProvider.jsm: 
> <resource://gre/modules/addons/XPIProvider.jsm:> ["XPIProvider"]
> 1424295900459 addons.manager  DEBUG   Loaded provider scope for 
> resource://gre/modules/LightweightThemeManager.jsm: 
> <resource://gre/modules/LightweightThemeManager.jsm:> 
> ["LightweightThemeManager"]
> 1424295900468 addons.xpi      DEBUG   startup
> 1424295900470 addons.xpi      INFO    Mapping fxdri...@googlecode.com 
> <mailto:fxdri...@googlecode.com> to 
> /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdri...@googlecode.com
>  
> <mailto:var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdri...@googlecode.com>
> 1424295900471 addons.xpi      DEBUG   Ignoring file entry whose name is not a 
> valid add-on ID: 
> /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/webdriver-staging
> 1424295900472 addons.xpi      INFO    Mapping 
> {972ce4c6-7e08-4474-a285-3208198ce6fd} to 
> /Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}
> 1424295900473 addons.xpi      DEBUG   Skipping unavailable install location 
> app-system-share
> 1424295900475 addons.xpi      DEBUG   checkForChanges
> 1424295900476 addons.xpi      DEBUG   Loaded add-on state from prefs: 
> {"app-profile":{"fxdri...@googlecode.com 
> <mailto:fxdri...@googlecode.com>":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdri...@googlecode.com
>  
> <mailto:var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdri...@googlecode.com>","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
> 1424295900480 addons.xpi      DEBUG   getModTime: Recursive scan of 
> {972ce4c6-7e08-4474-a285-3208198ce6fd}
> 1424295900483 addons.xpi      DEBUG   getInstallState changed: false, state: 
> {"app-profile":{"fxdri...@googlecode.com 
> <mailto:fxdri...@googlecode.com>":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdri...@googlecode.com
>  
> <mailto:var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/fxdri...@googlecode.com>","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
> 1424295900488 addons.xpi      DEBUG   No changes found
> 1424295900502 addons.manager  DEBUG   Registering shutdown blocker for 
> XPIProvider
> 1424295900504 addons.manager  DEBUG   Registering shutdown blocker for 
> LightweightThemeManager
> 1424295900507 addons.manager  DEBUG   Registering shutdown blocker for 
> OpenH264Provider
> 1424295900508 addons.manager  DEBUG   Registering shutdown blocker for 
> PluginProvider
> *** Blocklist::_preloadBlocklistFile: blocklist is disabled
> 1424295903113 addons.manager  DEBUG   Registering shutdown blocker for 
> <unnamed-provider>
> 
>       at 
> org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:118)
>       at 
> org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:246)
>       at 
> org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:114)
>       at 
> org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:191)
>       at 
> org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:186)
>       at 
> org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:182)
>       at 
> org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:95)
>       at 
> org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:53)
>       at 
> org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:199)
>       at 
> org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:161)
>       at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56)
>       at 
> org.apache.nutch.protocol.http.api.HttpRobotRulesParser.getRobotRulesSet(HttpRobotRulesParser.java:101)
>       at 
> org.apache.nutch.protocol.RobotRulesParser.getRobotRulesSet(RobotRulesParser.java:151)
>       at 
> org.apache.nutch.protocol.http.api.HttpBase.getRobotRules(HttpBase.java:492)
>       at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722)
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, 
> fetchQueues.getQueueCount=1
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, 
> fetchQueues.getQueueCount=1
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, 
> fetchQueues.getQueueCount=1
> 
>> On Feb 17, 2015, at 11:21 PM, Jiaxin Ye <jiaxi...@usc.edu 
>> <mailto:jiaxi...@usc.edu>> wrote:
>> 
>> Hi,
>> 
>> When you install the patch, did you see any fails? No fail is tolerated. I 
>> am guessing there is something wrong with ivy.xml. I am suggesting that 
>> checkout ALL files in Nutch and then try it again. 
>> 
>> Best,
>> Jiaxin
>> 
>> On Tuesday, February 17, 2015, Jaydeep Bagrecha <bagre...@usc.edu 
>> <mailto:bagre...@usc.edu>> wrote:
>> Hi all,
>>      I am trying to install and build selenium with nutch1.10 on Mac 
>> Yosemite.
>> 
>>  having following error after downloading selenium 
>> patch(https://issues.apache.org/jira/browse/NUTCH-1933 
>> <https://issues.apache.org/jira/browse/NUTCH-1933>) and while using “ant 
>> runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it?
>> 
>>  error: package org.openqa.selenium does not exist
>>     [javac] import org.openqa.selenium.By <http://org.openqa.selenium.by/>;
>>     [javac]                           ^
>>  error: package org.openqa.selenium does not exist
>>     [javac] import org.openqa.selenium.WebDriver;
>>     [javac]                           ^
>>  error: package org.openqa.selenium.firefox does not exist
>>     [javac] import org.openqa.selenium.firefox.FirefoxDriver;
>>     [javac]                                   ^
>>  error: package org.openqa.selenium.firefox does not exist
>>     [javac] import org.openqa.selenium.firefox.FirefoxProfile;
>> error: cannot find symbol
>>     [javac]   public static ThreadLocal<WebDriver> threadWebDriver = new 
>> ThreadLocal<WebDriver>() {
>>     [javac]                             ^
>>     [javac]   symbol:   class WebDriver
>>     [javac]   location: class HttpWebClient
>>  error: cannot find symbol
>>     [javac]     protected WebDriver initialValue()
>>     [javac]               ^
>>     [javac]   symbol: class WebDriver
>>  error: cannot find symbol
>>     [javac]       FirefoxProfile profile = new FirefoxProfile();
>>     [javac]       ^
>>     [javac]   symbol: class FirefoxProfile
>> error: cannot find symbol
>>     [javac]       WebDriver driver = new FirefoxDriver(profile);
>>     [javac]                              ^
>>     [javac]   symbol: class FirefoxDriver
>>  error: cannot find symbol
>>     [javac]       driver = new FirefoxDriver();
>>     [javac]                    ^
>>     [javac]   symbol:   class FirefoxDriver
>>     [javac]   location: class HttpWebClient
>> 
>>  error: cannot find symbol
>>     [javac]       new WebDriverWait(driver, 3);
>>     [javac]           ^
>>     [javac]   symbol:   class WebDriverWait
>>     [javac]   location: class HttpWebClient
>> 
>>  error: cannot find symbol
>>     [javac]       String innerHtml = 
>> driver.findElement(By.tagName("body")).getAttribute("innerHTML");
>>     [javac]                                             ^
>>     [javac]   symbol:   variable By
>>     [javac]   location: class HttpWebClient
>> 
>> Thanks,
>> Jaydeep
>> 
>>> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <jiaxi...@usc.edu 
>>> <javascript:_e(%7B%7D,'cvml','jiaxi...@usc.edu');>> wrote:
>>> 
>>> Sure. I will do it once I confirm it works...
>>> 
>>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) 
>>> <chris.a.mattm...@jpl.nasa.gov 
>>> <javascript:_e(%7B%7D,'cvml','chris.a.mattm...@jpl.nasa.gov');>> wrote:
>>> This is great, Jiaxin, can you please make a wiki page on the Nutch
>>> wiki that has this information?
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattm...@nasa.gov <>
>>> WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Jiaxin Ye <jiaxi...@usc.edu <>>
>>> Reply-To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>>> Date: Thursday, February 12, 2015 at 9:39 PM
>>> To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>>> Subject: Nutch-Selenium in Nutch 1.10
>>> 
>>> >Hi Li, Shuo. You are so right. I finished installing and successfully run
>>> >the butch with selenium and Firefox. I have a question though, does your
>>> >Firefox plug out for always all the urls we crawled?
>>> >
>>> >
>>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with
>>> >OS higher than 10.6 I think...
>>> >
>>> >
>>> >1. Download XQuatz, it's a dmp file, install it directly
>>> >2. Download Nutch 1.10
>>> >3. Download the patch and put it on the Nutch project directory
>>> >4. patch -p0 < THE PATCH NAME
>>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>>> >in the github told you. The patch basically updated those .xml file for
>>> >us. And the patch also installs lib-selenium and protocol selenium for us
>>> >(Correct me if
>>> > I am wrong)
>>> >6. Update tika dependency if needed
>>> >7. Go to the Nutch project directory and run ant runtime
>>> >8. Download Firefox
>>> >9. Open a new terminal and type
>>> >    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>>> >want...)
>>> >    There should be some errors after entering the command (for me at
>>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>>> >mode to 1777. Rerun the command. xvfb should be working.
>>> >10. Go to nutch > runtime > local and run the crawling command
>>> >
>>> >
>>> >Hope it helps. :)
>>> >
>>> >
>>> >Best,
>>> >Jiaxin
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>>> ><sli...@usc.edu <> <javascript:_e(%7B%7D,'cvml','sli...@usc.edu <>');>> 
>>> >wrote:
>>> >
>>> >I think I have possibly finished installing.
>>> >
>>> >
>>> >What you need to do:
>>> >0. git status and checkout what you have modified.
>>> >1. patch -p0 < YOUR_PATCH_FILE
>>> >2. ant clean jar
>>> >3. ant runtime
>>> >
>>> >
>>> >Will try crawling using selenium later on. Hope this helped. >_<
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>>> ><chris.a.mattm...@jpl.nasa.gov <>
>>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattm...@jpl.nasa.gov <>');>> wrote:
>>> >
>>> >Yes I believe you need to install X11 - why don't you try and report back
>>> >what you find thanks.
>>> >
>>> >Sent from my iPhone
>>> >
>>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxi...@usc.edu <>
>>> ><javascript:_e(%7B%7D,'cvml','jiaxi...@usc.edu <>');>> wrote:
>>> >
>>> >
>>> >
>>> >Hi professor, but can we use Selenium on Mac?
>>> >
>>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>>> ><chris.a.mattm...@jpl.nasa.gov <>
>>> ><javascript:_e(%7B%7D,'cvml','chris.a.mattm...@jpl.nasa.gov <>');>> wrote:
>>> >
>>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>>> >polar dataset you have been assigned in my CSCI 572 search engines class.
>>> >
>>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>>> >are here:
>>> >
>>> >https://issues.apache.org/jira/browse/NUTCH-1933 
>>> ><https://issues.apache.org/jira/browse/NUTCH-1933>
>>> >
>>> >
>>> >Cheers,
>>> >Chris
>>> >
>>> >
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >Chris Mattmann, Ph.D.
>>> >Chief Architect
>>> >Instrument Software and Science Data Systems Section (398)
>>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >Office: 168-519, Mailstop: 168-527
>>> >Email: chris.a.mattm...@nasa.gov <>
>>> >WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >Adjunct Associate Professor, Computer Science Department
>>> >University of Southern California, Los Angeles, CA 90089 USA
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >-----Original Message-----
>>> >From: Jiaxin Ye <jiaxi...@usc.edu <>>
>>> >Reply-To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>>> >Date: Thursday, February 12, 2015 at 12:46 AM
>>> >To: "dev@nutch.apache.org <>" <dev@nutch.apache.org <>>
>>> >Subject: Re: Nutch-Selenium in Nutch 1.10
>>> >
>>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>>> >>do we need Selenium anyway? Just easier to perform crawling?
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>>> >><sli...@usc.edu <>> wrote:
>>> >>
>>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>>> >>be installed properly. The issue would be I don't know how to integrate
>>> >>Selenium with Nutch 1.10.
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>>> >><jiaxi...@usc.edu <>> wrote:
>>> >>
>>> >>Hi all,
>>> >>
>>> >>
>>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>>> >>I find it difficult to install Xvfb on mac.
>>> >>
>>> >>
>>> >>Best,
>>> >>Jiaxin
>>> >>
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>>> >><sapna...@usc.edu <>> wrote:
>>> >>
>>> >>Hi Shuo Li,
>>> >>
>>> >>
>>> >>We were facing a similar issue. Prof. Mattman suggested we look into this
>>> >>patch for Selenium on Nutch 1.10 :
>>> >>https://issues.apache.org/jira/browse/NUTCH-1933 
>>> >><https://issues.apache.org/jira/browse/NUTCH-1933>.
>>> >>
>>> >>
>>> >>Hope this helps!
>>> >>
>>> >>
>>> >>Thanks,
>>> >>Sapna
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>>> >><sli...@usc.edu <>> wrote:
>>> >>
>>> >>Yop,
>>> >>
>>> >>
>>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
>>> >>out:
>>> >>
>>> >>
>>> >>error: package org.apache.nutch.storage does not exist
>>> >>
>>> >>
>>> >>
>>> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>>> >>in 1.10?
>>> >>
>>> >>
>>> >>Any advice would be appreciated.
>>> >>
>>> >>
>>> >>Regards,
>>> >>Shuo Li
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>--
>>> >>Graduate Student
>>> >>MS in CS (Data Science)
>>> >>Viterbi School of Engineering
>>> >>University of Southern California
>>> >>
>>> >>
>>> >>Phone:
>>> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> 
>> 
> 

Reply via email to