Sure. I will do it once I confirm it works... On Thursday, February 12, 2015, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote:
> This is great, Jiaxin, can you please make a wiki page on the Nutch > wiki that has this information? > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov <javascript:;> > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: Jiaxin Ye <jiaxi...@usc.edu <javascript:;>> > Reply-To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org > <javascript:;>> > Date: Thursday, February 12, 2015 at 9:39 PM > To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org > <javascript:;>> > Subject: Nutch-Selenium in Nutch 1.10 > > >Hi Li, Shuo. You are so right. I finished installing and successfully run > >the butch with selenium and Firefox. I have a question though, does your > >Firefox plug out for always all the urls we crawled? > > > > > >Hi Prof Mattmann. I think here is the way we install selenium on MAC with > >OS higher than 10.6 I think... > > > > > >1. Download XQuatz, it's a dmp file, install it directly > >2. Download Nutch 1.10 > >3. Download the patch and put it on the Nutch project directory > >4. patch -p0 < THE PATCH NAME > >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial > >in the github told you. The patch basically updated those .xml file for > >us. And the patch also installs lib-selenium and protocol selenium for us > >(Correct me if > > I am wrong) > >6. Update tika dependency if needed > >7. Go to the Nutch project directory and run ant runtime > >8. Download Firefox > >9. Open a new terminal and type > > xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you > >want...) > > There should be some errors after entering the command (for me at > >least). Manually sudo create a /tmp/.X11-unix folder, and then set the > >mode to 1777. Rerun the command. xvfb should be working. > >10. Go to nutch > runtime > local and run the crawling command > > > > > >Hope it helps. :) > > > > > >Best, > >Jiaxin > > > > > > > > > > > >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li > ><sli...@usc.edu <javascript:;> <javascript:_e(%7B%7D,'cvml',' > sli...@usc.edu <javascript:;>');>> wrote: > > > >I think I have possibly finished installing. > > > > > >What you need to do: > >0. git status and checkout what you have modified. > >1. patch -p0 < YOUR_PATCH_FILE > >2. ant clean jar > >3. ant runtime > > > > > >Will try crawling using selenium later on. Hope this helped. >_< > > > > > >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980) > ><chris.a.mattm...@jpl.nasa.gov <javascript:;> > ><javascript:_e(%7B%7D,'cvml','chris.a.mattm...@jpl.nasa.gov > <javascript:;>');>> wrote: > > > >Yes I believe you need to install X11 - why don't you try and report back > >what you find thanks. > > > >Sent from my iPhone > > > >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <jiaxi...@usc.edu <javascript:;> > ><javascript:_e(%7B%7D,'cvml','jiaxi...@usc.edu <javascript:;>');>> wrote: > > > > > > > >Hi professor, but can we use Selenium on Mac? > > > >On Thursday, February 12, 2015, Mattmann, Chris A (3980) > ><chris.a.mattm...@jpl.nasa.gov <javascript:;> > ><javascript:_e(%7B%7D,'cvml','chris.a.mattm...@jpl.nasa.gov > <javascript:;>');>> wrote: > > > >You need Selenium Jiaxin, in order to crawl dynamic pages in the > >polar dataset you have been assigned in my CSCI 572 search engines class. > > > >The instructions for integrating Selenium with Nutch 1.10-trunk > >are here: > > > >https://issues.apache.org/jira/browse/NUTCH-1933 > > > > > >Cheers, > >Chris > > > > > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Chris Mattmann, Ph.D. > >Chief Architect > >Instrument Software and Science Data Systems Section (398) > >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >Office: 168-519, Mailstop: 168-527 > >Email: chris.a.mattm...@nasa.gov <javascript:;> > >WWW: http://sunset.usc.edu/~mattmann/ > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Adjunct Associate Professor, Computer Science Department > >University of Southern California, Los Angeles, CA 90089 USA > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > > >-----Original Message----- > >From: Jiaxin Ye <jiaxi...@usc.edu <javascript:;>> > >Reply-To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org > <javascript:;>> > >Date: Thursday, February 12, 2015 at 12:46 AM > >To: "dev@nutch.apache.org <javascript:;>" <dev@nutch.apache.org > <javascript:;>> > >Subject: Re: Nutch-Selenium in Nutch 1.10 > > > >>Well, good choice. I am thinking changing to ubuntu now. The thing is why > >>do we need Selenium anyway? Just easier to perform crawling? > >> > >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li > >><sli...@usc.edu <javascript:;>> wrote: > >> > >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm > >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still > >>be installed properly. The issue would be I don't know how to integrate > >>Selenium with Nutch 1.10. > >> > >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye > >><jiaxi...@usc.edu <javascript:;>> wrote: > >> > >>Hi all, > >> > >> > >>Anyone here knows where to find the setup tutorial for Selenium on Mac ?? > >>I find it difficult to install Xvfb on mac. > >> > >> > >>Best, > >>Jiaxin > >> > >> > >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh > >><sapna...@usc.edu <javascript:;>> wrote: > >> > >>Hi Shuo Li, > >> > >> > >>We were facing a similar issue. Prof. Mattman suggested we look into this > >>patch for Selenium on Nutch 1.10 : > >>https://issues.apache.org/jira/browse/NUTCH-1933. > >> > >> > >>Hope this helps! > >> > >> > >>Thanks, > >>Sapna > >> > >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li > >><sli...@usc.edu <javascript:;>> wrote: > >> > >>Yop, > >> > >> > >>I'm trying to install selenium in Nutch 1.10. However, this error pops > >>out: > >> > >> > >>error: package org.apache.nutch.storage does not exist > >> > >> > >> > >>I can only find this package in Nutch 2.x. Is there a way to use Selenium > >>in 1.10? > >> > >> > >>Any advice would be appreciated. > >> > >> > >>Regards, > >>Shuo Li > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >>-- > >>Graduate Student > >>MS in CS (Data Science) > >>Viterbi School of Engineering > >>University of Southern California > >> > >> > >>Phone: > >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >