Re: Nutch-Selenium in Nutch 1.10

2015-02-19 Thread Jiaxin Ye
ocation: class HttpWebClient
>>
>>  error: cannot find symbol
>> [javac]   new WebDriverWait(driver, 3);
>> [javac]   ^
>> [javac]   symbol:   class WebDriverWait
>> [javac]   location: class HttpWebClient
>>
>>  error: cannot find symbol
>> [javac]   String innerHtml =
>> driver.findElement(By.tagName("body")).getAttribute("innerHTML");
>> [javac] ^
>> [javac]   symbol:   variable By
>> [javac]   location: class HttpWebClient
>>
>> Thanks,
>> Jaydeep
>>
>> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye  wrote:
>>
>> Sure. I will do it once I confirm it works...
>>
>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
>> chris.a.mattm...@jpl.nasa.gov> wrote:
>>
>>> This is great, Jiaxin, can you please make a wiki page on the Nutch
>>> wiki that has this information?
>>>
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattm...@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -Original Message-
>>> From: Jiaxin Ye 
>>> Reply-To: "dev@nutch.apache.org" 
>>> Date: Thursday, February 12, 2015 at 9:39 PM
>>> To: "dev@nutch.apache.org" 
>>> Subject: Nutch-Selenium in Nutch 1.10
>>>
>>> >Hi Li, Shuo. You are so right. I finished installing and successfully
>>> run
>>> >the butch with selenium and Firefox. I have a question though, does your
>>> >Firefox plug out for always all the urls we crawled?
>>> >
>>> >
>>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC
>>> with
>>> >OS higher than 10.6 I think...
>>> >
>>> >
>>> >1. Download XQuatz, it's a dmp file, install it directly
>>> >2. Download Nutch 1.10
>>> >3. Download the patch and put it on the Nutch project directory
>>> >4. patch -p0 < THE PATCH NAME
>>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>>> >in the github told you. The patch basically updated those .xml file for
>>> >us. And the patch also installs lib-selenium and protocol selenium for
>>> us
>>> >(Correct me if
>>> > I am wrong)
>>> >6. Update tika dependency if needed
>>> >7. Go to the Nutch project directory and run ant runtime
>>> >8. Download Firefox
>>> >9. Open a new terminal and type
>>> >xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>>> >want...)
>>> >There should be some errors after entering the command (for me at
>>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>>> >mode to 1777. Rerun the command. xvfb should be working.
>>> >10. Go to nutch > runtime > local and run the crawling command
>>> >
>>> >
>>> >Hope it helps. :)
>>> >
>>> >
>>> >Best,
>>> >Jiaxin
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>>> >>
>>> wrote:
>>> >
>>> >I think I have possibly finished installing.
>>> >
>>> >
>>> >What you need to do:
>>> >0. git status and checkout what you have modified.
>>> >1. patch -p0 < YOUR_PATCH_FILE
>>> >2. ant clean jar
>>> >3. ant runtime
>>> >
>>> >
>>> >Will try crawling using selenium later on. Hope this helped. >_<
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>>> >>> >> wrote:
>>> >
>>> >Yes I believe you need to install X11 - why don't you try and repo

Re: Nutch-Selenium in Nutch 1.10

2015-02-19 Thread Jaydeep Bagrecha
 > local and run the crawling command
>>> >
>>> >
>>> >Hope it helps. :)
>>> >
>>> >
>>> >Best,
>>> >Jiaxin
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>>> > ');>> 
>>> >wrote:
>>> >
>>> >I think I have possibly finished installing.
>>> >
>>> >
>>> >What you need to do:
>>> >0. git status and checkout what you have modified.
>>> >1. patch -p0 < YOUR_PATCH_FILE
>>> >2. ant clean jar
>>> >3. ant runtime
>>> >
>>> >
>>> >Will try crawling using selenium later on. Hope this helped. >_<
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>>> >
>>> >');>> wrote:
>>> >
>>> >Yes I believe you need to install X11 - why don't you try and report back
>>> >what you find thanks.
>>> >
>>> >Sent from my iPhone
>>> >
>>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye 
>>> >');>> wrote:
>>> >
>>> >
>>> >
>>> >Hi professor, but can we use Selenium on Mac?
>>> >
>>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>>> >
>>> >');>> wrote:
>>> >
>>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>>> >polar dataset you have been assigned in my CSCI 572 search engines class.
>>> >
>>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>>> >are here:
>>> >
>>> >https://issues.apache.org/jira/browse/NUTCH-1933 
>>> ><https://issues.apache.org/jira/browse/NUTCH-1933>
>>> >
>>> >
>>> >Cheers,
>>> >Chris
>>> >
>>> >
>>> >++
>>> >Chris Mattmann, Ph.D.
>>> >Chief Architect
>>> >Instrument Software and Science Data Systems Section (398)
>>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >Office: 168-519, Mailstop: 168-527
>>> >Email: chris.a.mattm...@nasa.gov <>
>>> >WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>>> >++
>>> >Adjunct Associate Professor, Computer Science Department
>>> >University of Southern California, Los Angeles, CA 90089 USA
>>> >++
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >-Original Message-
>>> >From: Jiaxin Ye >
>>> >Reply-To: "dev@nutch.apache.org <>" >
>>> >Date: Thursday, February 12, 2015 at 12:46 AM
>>> >To: "dev@nutch.apache.org <>" >
>>> >Subject: Re: Nutch-Selenium in Nutch 1.10
>>> >
>>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>>> >>do we need Selenium anyway? Just easier to perform crawling?
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>>> >>> wrote:
>>> >>
>>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>>> >>be installed properly. The issue would be I don't know how to integrate
>>> >>Selenium with Nutch 1.10.
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>>> >>> wrote:
>>> >>
>>> >>Hi all,
>>> >>
>>> >>
>>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>>> >>I find it difficult to install Xvfb on mac.
>>> >>
>>> >>
>>> >>Best,
>>> >>Jiaxin
>>> >>
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>>> >>> wrote:
>>> >>
>>> >>Hi Shuo Li,
>>> >>
>>> >>
>>> >>We were facing a similar issue. Prof. Mattman suggested we look into this
>>> >>patch for Selenium on Nutch 1.10 :
>>> >>https://issues.apache.org/jira/browse/NUTCH-1933 
>>> >><https://issues.apache.org/jira/browse/NUTCH-1933>.
>>> >>
>>> >>
>>> >>Hope this helps!
>>> >>
>>> >>
>>> >>Thanks,
>>> >>Sapna
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>>> >>> wrote:
>>> >>
>>> >>Yop,
>>> >>
>>> >>
>>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
>>> >>out:
>>> >>
>>> >>
>>> >>error: package org.apache.nutch.storage does not exist
>>> >>
>>> >>
>>> >>
>>> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>>> >>in 1.10?
>>> >>
>>> >>
>>> >>Any advice would be appreciated.
>>> >>
>>> >>
>>> >>Regards,
>>> >>Shuo Li
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>--
>>> >>Graduate Student
>>> >>MS in CS (Data Science)
>>> >>Viterbi School of Engineering
>>> >>University of Southern California
>>> >>
>>> >>
>>> >>Phone:
>>> >>+1 650-307-9848  
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> 
>> 
> 



Re: Nutch-Selenium in Nutch 1.10

2015-02-18 Thread Jaydeep Bagrecha
On Feb 12, 2015, at 11:37 PM, Jiaxin Ye > > wrote:
>> 
>> Sure. I will do it once I confirm it works...
>> 
>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) 
>> > > wrote:
>> This is great, Jiaxin, can you please make a wiki page on the Nutch
>> wiki that has this information?
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov <>
>> WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>> ++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>> 
>> 
>> 
>> 
>> 
>> 
>> -Original Message-
>> From: Jiaxin Ye >
>> Reply-To: "dev@nutch.apache.org <>" >
>> Date: Thursday, February 12, 2015 at 9:39 PM
>> To: "dev@nutch.apache.org <>" >
>> Subject: Nutch-Selenium in Nutch 1.10
>> 
>> >Hi Li, Shuo. You are so right. I finished installing and successfully run
>> >the butch with selenium and Firefox. I have a question though, does your
>> >Firefox plug out for always all the urls we crawled?
>> >
>> >
>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with
>> >OS higher than 10.6 I think...
>> >
>> >
>> >1. Download XQuatz, it's a dmp file, install it directly
>> >2. Download Nutch 1.10
>> >3. Download the patch and put it on the Nutch project directory
>> >4. patch -p0 < THE PATCH NAME
>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>> >in the github told you. The patch basically updated those .xml file for
>> >us. And the patch also installs lib-selenium and protocol selenium for us
>> >(Correct me if
>> > I am wrong)
>> >6. Update tika dependency if needed
>> >7. Go to the Nutch project directory and run ant runtime
>> >8. Download Firefox
>> >9. Open a new terminal and type
>> >xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>> >want...)
>> >There should be some errors after entering the command (for me at
>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>> >mode to 1777. Rerun the command. xvfb should be working.
>> >10. Go to nutch > runtime > local and run the crawling command
>> >
>> >
>> >Hope it helps. :)
>> >
>> >
>> >Best,
>> >Jiaxin
>> >
>> >
>> >
>> >
>> >
>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>> > ');>> 
>> >wrote:
>> >
>> >I think I have possibly finished installing.
>> >
>> >
>> >What you need to do:
>> >0. git status and checkout what you have modified.
>> >1. patch -p0 < YOUR_PATCH_FILE
>> >2. ant clean jar
>> >3. ant runtime
>> >
>> >
>> >Will try crawling using selenium later on. Hope this helped. >_<
>> >
>> >
>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>> >
>> >');>> wrote:
>> >
>> >Yes I believe you need to install X11 - why don't you try and report back
>> >what you find thanks.
>> >
>> >Sent from my iPhone
>> >
>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye 
>> >');>> wrote:
>> >
>> >
>> >
>> >Hi professor, but can we use Selenium on Mac?
>> >
>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>> >
>> >');>> wrote:
>> >
>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>> >polar dataset you have been assigned in my CSCI 572 search engines class.
>> >
>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>> >are here:
>> >
>> >https://issues.apache.org/jira/browse/NUTCH-1933 
>> ><https://issues.apache.org/jira/browse/NUTCH-1933>
>> >
>> >
>> >Cheers,
>> >Chris
>> >
>> >
>> >+++

Re: Nutch-Selenium in Nutch 1.10

2015-02-17 Thread Jiaxin Ye
updated those .xml file for
>> >us. And the patch also installs lib-selenium and protocol selenium for us
>> >(Correct me if
>> > I am wrong)
>> >6. Update tika dependency if needed
>> >7. Go to the Nutch project directory and run ant runtime
>> >8. Download Firefox
>> >9. Open a new terminal and type
>> >xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>> >want...)
>> >There should be some errors after entering the command (for me at
>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>> >mode to 1777. Rerun the command. xvfb should be working.
>> >10. Go to nutch > runtime > local and run the crawling command
>> >
>> >
>> >Hope it helps. :)
>> >
>> >
>> >Best,
>> >Jiaxin
>> >
>> >
>> >
>> >
>> >
>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>> >> wrote:
>> >
>> >I think I have possibly finished installing.
>> >
>> >
>> >What you need to do:
>> >0. git status and checkout what you have modified.
>> >1. patch -p0 < YOUR_PATCH_FILE
>> >2. ant clean jar
>> >3. ant runtime
>> >
>> >
>> >Will try crawling using selenium later on. Hope this helped. >_<
>> >
>> >
>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>> >> >> wrote:
>> >
>> >Yes I believe you need to install X11 - why don't you try and report back
>> >what you find thanks.
>> >
>> >Sent from my iPhone
>> >
>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye > >> wrote:
>> >
>> >
>> >
>> >Hi professor, but can we use Selenium on Mac?
>> >
>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>> >> >> wrote:
>> >
>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>> >polar dataset you have been assigned in my CSCI 572 search engines class.
>> >
>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>> >are here:
>> >
>> >https://issues.apache.org/jira/browse/NUTCH-1933
>> >
>> >
>> >Cheers,
>> >Chris
>> >
>> >
>> >++
>> >Chris Mattmann, Ph.D.
>> >Chief Architect
>> >Instrument Software and Science Data Systems Section (398)
>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >Office: 168-519, Mailstop: 168-527
>> >Email: chris.a.mattm...@nasa.gov
>> >WWW:  http://sunset.usc.edu/~mattmann/
>> >++
>> >Adjunct Associate Professor, Computer Science Department
>> >University of Southern California, Los Angeles, CA 90089 USA
>> >++
>> >
>> >
>> >
>> >
>> >
>> >
>> >-Original Message-
>> >From: Jiaxin Ye 
>> >Reply-To: "dev@nutch.apache.org" 
>> >Date: Thursday, February 12, 2015 at 12:46 AM
>> >To: "dev@nutch.apache.org" 
>> >Subject: Re: Nutch-Selenium in Nutch 1.10
>> >
>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is
>> why
>> >>do we need Selenium anyway? Just easier to perform crawling?
>> >>
>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>> >> wrote:
>> >>
>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>> >>be installed properly. The issue would be I don't know how to integrate
>> >>Selenium with Nutch 1.10.
>> >>
>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>> >> wrote:
>> >>
>> >>Hi all,
>> >>
>> >>
>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac
>> ??
>> >>I find it difficult to install Xvfb on mac.
>> >>
>> >>
>> >>Best,
>> >>Jiaxin
>> >>
>> >>
>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>> >> wrote:
>> >>
>> >>Hi Shuo Li,
>> >>
>> >>
>> >>We were facing a similar issue. Prof. Mattman suggested we look into
>> this
>> >>patch for Selenium on Nutch 1.10 :
>> >>https://issues.apache.org/jira/browse/NUTCH-1933.
>> >>
>> >>
>> >>Hope this helps!
>> >>
>> >>
>> >>Thanks,
>> >>Sapna
>> >>
>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>> >> wrote:
>> >>
>> >>Yop,
>> >>
>> >>
>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
>> >>out:
>> >>
>> >>
>> >>error: package org.apache.nutch.storage does not exist
>> >>
>> >>
>> >>
>> >>I can only find this package in Nutch 2.x. Is there a way to use
>> Selenium
>> >>in 1.10?
>> >>
>> >>
>> >>Any advice would be appreciated.
>> >>
>> >>
>> >>Regards,
>> >>Shuo Li
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>--
>> >>Graduate Student
>> >>MS in CS (Data Science)
>> >>Viterbi School of Engineering
>> >>University of Southern California
>> >>
>> >>
>> >>Phone:
>> >>+1 650-307-9848  
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>


Re: Nutch-Selenium in Nutch 1.10

2015-02-17 Thread Jaydeep Bagrecha
;mode to 1777. Rerun the command. xvfb should be working.
> >10. Go to nutch > runtime > local and run the crawling command
> >
> >
> >Hope it helps. :)
> >
> >
> >Best,
> >Jiaxin
> >
> >
> >
> >
> >
> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
> >  >');>> wrote:
> >
> >I think I have possibly finished installing.
> >
> >
> >What you need to do:
> >0. git status and checkout what you have modified.
> >1. patch -p0 < YOUR_PATCH_FILE
> >2. ant clean jar
> >3. ant runtime
> >
> >
> >Will try crawling using selenium later on. Hope this helped. >_<
> >
> >
> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
> >
> > >');>> wrote:
> >
> >Yes I believe you need to install X11 - why don't you try and report back
> >what you find thanks.
> >
> >Sent from my iPhone
> >
> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye 
> >');>> wrote:
> >
> >
> >
> >Hi professor, but can we use Selenium on Mac?
> >
> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
> >
> > >');>> wrote:
> >
> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
> >polar dataset you have been assigned in my CSCI 572 search engines class.
> >
> >The instructions for integrating Selenium with Nutch 1.10-trunk
> >are here:
> >
> >https://issues.apache.org/jira/browse/NUTCH-1933 
> ><https://issues.apache.org/jira/browse/NUTCH-1933>
> >
> >
> >Cheers,
> >Chris
> >
> >
> >++++++
> >Chris Mattmann, Ph.D.
> >Chief Architect
> >Instrument Software and Science Data Systems Section (398)
> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >Office: 168-519, Mailstop: 168-527
> >Email: chris.a.mattm...@nasa.gov 
> >WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
> >++
> >Adjunct Associate Professor, Computer Science Department
> >University of Southern California, Los Angeles, CA 90089 USA
> >++
> >
> >
> >
> >
> >
> >
> >-Original Message-
> >From: Jiaxin Ye >
> >Reply-To: "dev@nutch.apache.org "  >>
> >Date: Thursday, February 12, 2015 at 12:46 AM
> >To: "dev@nutch.apache.org "  >>
> >Subject: Re: Nutch-Selenium in Nutch 1.10
> >
> >>Well, good choice. I am thinking changing to ubuntu now. The thing is why
> >>do we need Selenium anyway? Just easier to perform crawling?
> >>
> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
> >>> wrote:
> >>
> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
> >>be installed properly. The issue would be I don't know how to integrate
> >>Selenium with Nutch 1.10.
> >>
> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
> >>> wrote:
> >>
> >>Hi all,
> >>
> >>
> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
> >>I find it difficult to install Xvfb on mac.
> >>
> >>
> >>Best,
> >>Jiaxin
> >>
> >>
> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
> >>> wrote:
> >>
> >>Hi Shuo Li,
> >>
> >>
> >>We were facing a similar issue. Prof. Mattman suggested we look into this
> >>patch for Selenium on Nutch 1.10 :
> >>https://issues.apache.org/jira/browse/NUTCH-1933 
> >><https://issues.apache.org/jira/browse/NUTCH-1933>.
> >>
> >>
> >>Hope this helps!
> >>
> >>
> >>Thanks,
> >>Sapna
> >>
> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
> >>> wrote:
> >>
> >>Yop,
> >>
> >>
> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
> >>out:
> >>
> >>
> >>error: package org.apache.nutch.storage does not exist
> >>
> >>
> >>
> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium
> >>in 1.10?
> >>
> >>
> >>Any advice would be appreciated.
> >>
> >>
> >>Regards,
> >>Shuo Li
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>--
> >>Graduate Student
> >>MS in CS (Data Science)
> >>Viterbi School of Engineering
> >>University of Southern California
> >>
> >>
> >>Phone:
> >>+1 650-307-9848  
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> 



Re: Nutch-Selenium in Nutch 1.10

2015-02-12 Thread Jiaxin Ye
Sure. I will do it once I confirm it works...

On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> This is great, Jiaxin, can you please make a wiki page on the Nutch
> wiki that has this information?
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov 
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: Jiaxin Ye >
> Reply-To: "dev@nutch.apache.org "  >
> Date: Thursday, February 12, 2015 at 9:39 PM
> To: "dev@nutch.apache.org "  >
> Subject: Nutch-Selenium in Nutch 1.10
>
> >Hi Li, Shuo. You are so right. I finished installing and successfully run
> >the butch with selenium and Firefox. I have a question though, does your
> >Firefox plug out for always all the urls we crawled?
> >
> >
> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with
> >OS higher than 10.6 I think...
> >
> >
> >1. Download XQuatz, it's a dmp file, install it directly
> >2. Download Nutch 1.10
> >3. Download the patch and put it on the Nutch project directory
> >4. patch -p0 < THE PATCH NAME
> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
> >in the github told you. The patch basically updated those .xml file for
> >us. And the patch also installs lib-selenium and protocol selenium for us
> >(Correct me if
> > I am wrong)
> >6. Update tika dependency if needed
> >7. Go to the Nutch project directory and run ant runtime
> >8. Download Firefox
> >9. Open a new terminal and type
> >xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
> >want...)
> >There should be some errors after entering the command (for me at
> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
> >mode to 1777. Rerun the command. xvfb should be working.
> >10. Go to nutch > runtime > local and run the crawling command
> >
> >
> >Hope it helps. :)
> >
> >
> >Best,
> >Jiaxin
> >
> >
> >
> >
> >
> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
> >  sli...@usc.edu ');>> wrote:
> >
> >I think I have possibly finished installing.
> >
> >
> >What you need to do:
> >0. git status and checkout what you have modified.
> >1. patch -p0 < YOUR_PATCH_FILE
> >2. ant clean jar
> >3. ant runtime
> >
> >
> >Will try crawling using selenium later on. Hope this helped. >_<
> >
> >
> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
> >
> > ');>> wrote:
> >
> >Yes I believe you need to install X11 - why don't you try and report back
> >what you find thanks.
> >
> >Sent from my iPhone
> >
> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye 
> >');>> wrote:
> >
> >
> >
> >Hi professor, but can we use Selenium on Mac?
> >
> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
> >
> > ');>> wrote:
> >
> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
> >polar dataset you have been assigned in my CSCI 572 search engines class.
> >
> >The instructions for integrating Selenium with Nutch 1.10-trunk
> >are here:
> >
> >https://issues.apache.org/jira/browse/NUTCH-1933
> >
> >
> >Cheers,
> >Chris
> >
> >
> >++
> >Chris Mattmann, Ph.D.
> >Chief Architect
> >Instrument Software and Science Data Systems Section (398)
> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >Office: 168-519, Mailstop: 168-527
> >Email: chris.a.mattm...@nasa.gov 
> >WWW:  http://sunset.usc.edu/~mattmann/
> >++
> >Adjunct Associate Professor, Computer Science Department
> >University of Southern California, Los Angeles, CA 90089 USA
> >++
>

Re: Nutch-Selenium in Nutch 1.10

2015-02-12 Thread Mattmann, Chris A (3980)
This is great, Jiaxin, can you please make a wiki page on the Nutch
wiki that has this information?

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Jiaxin Ye 
Reply-To: "dev@nutch.apache.org" 
Date: Thursday, February 12, 2015 at 9:39 PM
To: "dev@nutch.apache.org" 
Subject: Nutch-Selenium in Nutch 1.10

>Hi Li, Shuo. You are so right. I finished installing and successfully run
>the butch with selenium and Firefox. I have a question though, does your
>Firefox plug out for always all the urls we crawled?
>
>
>Hi Prof Mattmann. I think here is the way we install selenium on MAC with
>OS higher than 10.6 I think...
>
>
>1. Download XQuatz, it's a dmp file, install it directly
>2. Download Nutch 1.10
>3. Download the patch and put it on the Nutch project directory
>4. patch -p0 < THE PATCH NAME
>5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>in the github told you. The patch basically updated those .xml file for
>us. And the patch also installs lib-selenium and protocol selenium for us
>(Correct me if
> I am wrong)
>6. Update tika dependency if needed
>7. Go to the Nutch project directory and run ant runtime
>8. Download Firefox
>9. Open a new terminal and type
>xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>want...)
>There should be some errors after entering the command (for me at
>least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>mode to 1777. Rerun the command. xvfb should be working.
>10. Go to nutch > runtime > local and run the crawling command
>
>
>Hope it helps. :)
>
>
>Best,
>Jiaxin
>
>
>
>
>
>On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>> wrote:
>
>I think I have possibly finished installing.
>
>
>What you need to do:
>0. git status and checkout what you have modified.
>1. patch -p0 < YOUR_PATCH_FILE
>2. ant clean jar
>3. ant runtime
>
>
>Will try crawling using selenium later on. Hope this helped. >_<
>
>
>On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>> wrote:
>
>Yes I believe you need to install X11 - why don't you try and report back
>what you find thanks.
>
>Sent from my iPhone
>
>On Feb 12, 2015, at 8:28 AM, Jiaxin Ye > wrote:
>
>
>
>Hi professor, but can we use Selenium on Mac?
>
>On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>> wrote:
>
>You need Selenium Jiaxin, in order to crawl dynamic pages in the
>polar dataset you have been assigned in my CSCI 572 search engines class.
>
>The instructions for integrating Selenium with Nutch 1.10-trunk
>are here:
>
>https://issues.apache.org/jira/browse/NUTCH-1933
>
>
>Cheers,
>Chris
>
>
>++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattm...@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++
>
>
>
>
>
>
>-Original Message-
>From: Jiaxin Ye 
>Reply-To: "dev@nutch.apache.org" 
>Date: Thursday, February 12, 2015 at 12:46 AM
>To: "dev@nutch.apache.org" 
>Subject: Re: Nutch-Selenium in Nutch 1.10
>
>>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>>do we need Selenium anyway? Just easier to perform crawling?
>>
>>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>> wrote:
>>
>>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>>be installed properly. The issue would be I don't know how to integrate
>>Selenium with Nutch 1.10.
>>
>>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>> wrote:
>>
>>

Re: Nutch-Selenium in Nutch 1.10

2015-02-12 Thread Shuo Li
I think I have possibly finished installing.

What you need to do:
0. git status and checkout what you have modified.
1. patch -p0 < YOUR_PATCH_FILE
2. ant clean jar
3. ant runtime

Will try crawling using selenium later on. Hope this helped. >_<

On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

>  Yes I believe you need to install X11 - why don't you try and report
> back what you find thanks.
>
> Sent from my iPhone
>
> On Feb 12, 2015, at 8:28 AM, Jiaxin Ye  wrote:
>
>  Hi professor, but can we use Selenium on Mac?
>
> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> You need Selenium Jiaxin, in order to crawl dynamic pages in the
>> polar dataset you have been assigned in my CSCI 572 search engines class.
>>
>> The instructions for integrating Selenium with Nutch 1.10-trunk
>> are here:
>>
>> https://issues.apache.org/jira/browse/NUTCH-1933
>>
>>
>> Cheers,
>> Chris
>>
>>
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++
>>
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Jiaxin Ye 
>> Reply-To: "dev@nutch.apache.org" 
>> Date: Thursday, February 12, 2015 at 12:46 AM
>> To: "dev@nutch.apache.org" 
>> Subject: Re: Nutch-Selenium in Nutch 1.10
>>
>> >Well, good choice. I am thinking changing to ubuntu now. The thing is why
>> >do we need Selenium anyway? Just easier to perform crawling?
>> >
>> >On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>> > wrote:
>> >
>> >Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>> >using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>> >be installed properly. The issue would be I don't know how to integrate
>> >Selenium with Nutch 1.10.
>> >
>> >On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>> > wrote:
>> >
>> >Hi all,
>> >
>> >
>> >Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>> >I find it difficult to install Xvfb on mac.
>> >
>> >
>> >Best,
>> >Jiaxin
>> >
>> >
>> >On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>> > wrote:
>> >
>> >Hi Shuo Li,
>> >
>> >
>> >We were facing a similar issue. Prof. Mattman suggested we look into this
>> >patch for Selenium on Nutch 1.10 :
>> >https://issues.apache.org/jira/browse/NUTCH-1933.
>> >
>> >
>> >Hope this helps!
>> >
>> >
>> >Thanks,
>> >Sapna
>> >
>> >On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>> > wrote:
>> >
>> >Yop,
>> >
>> >
>> >I'm trying to install selenium in Nutch 1.10. However, this error pops
>> >out:
>> >
>> >
>> >error: package org.apache.nutch.storage does not exist
>> >
>> >
>> >
>> >I can only find this package in Nutch 2.x. Is there a way to use Selenium
>> >in 1.10?
>> >
>> >
>> >Any advice would be appreciated.
>> >
>> >
>> >Regards,
>> >Shuo Li
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >--
>> >Graduate Student
>> >MS in CS (Data Science)
>> >Viterbi School of Engineering
>> >University of Southern California
>> >
>> >
>> >Phone:
>> >+1 650-307-9848 
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>


Re: Nutch-Selenium in Nutch 1.10

2015-02-12 Thread Mattmann, Chris A (3980)
Yes I believe you need to install X11 - why don't you try and report back what 
you find thanks.

Sent from my iPhone

On Feb 12, 2015, at 8:28 AM, Jiaxin Ye 
mailto:jiaxi...@usc.edu>> wrote:

Hi professor, but can we use Selenium on Mac?

On Thursday, February 12, 2015, Mattmann, Chris A (3980) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
You need Selenium Jiaxin, in order to crawl dynamic pages in the
polar dataset you have been assigned in my CSCI 572 search engines class.

The instructions for integrating Selenium with Nutch 1.10-trunk
are here:

https://issues.apache.org/jira/browse/NUTCH-1933


Cheers,
Chris


++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Jiaxin Ye >
Reply-To: "dev@nutch.apache.org" 
>
Date: Thursday, February 12, 2015 at 12:46 AM
To: "dev@nutch.apache.org" >
Subject: Re: Nutch-Selenium in Nutch 1.10

>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>do we need Selenium anyway? Just easier to perform crawling?
>
>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>> wrote:
>
>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>be installed properly. The issue would be I don't know how to integrate
>Selenium with Nutch 1.10.
>
>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>> wrote:
>
>Hi all,
>
>
>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>I find it difficult to install Xvfb on mac.
>
>
>Best,
>Jiaxin
>
>
>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>> wrote:
>
>Hi Shuo Li,
>
>
>We were facing a similar issue. Prof. Mattman suggested we look into this
>patch for Selenium on Nutch 1.10 :
>https://issues.apache.org/jira/browse/NUTCH-1933.
>
>
>Hope this helps!
>
>
>Thanks,
>Sapna
>
>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>> wrote:
>
>Yop,
>
>
>I'm trying to install selenium in Nutch 1.10. However, this error pops
>out:
>
>
>error: package org.apache.nutch.storage does not exist
>
>
>
>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>in 1.10?
>
>
>Any advice would be appreciated.
>
>
>Regards,
>Shuo Li
>
>
>
>
>
>
>
>
>
>
>--
>Graduate Student
>MS in CS (Data Science)
>Viterbi School of Engineering
>University of Southern California
>
>
>Phone:
>+1 650-307-9848 
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



Re: Nutch-Selenium in Nutch 1.10

2015-02-12 Thread Jiaxin Ye
Hi professor, but can we use Selenium on Mac?

On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> You need Selenium Jiaxin, in order to crawl dynamic pages in the
> polar dataset you have been assigned in my CSCI 572 search engines class.
>
> The instructions for integrating Selenium with Nutch 1.10-trunk
> are here:
>
> https://issues.apache.org/jira/browse/NUTCH-1933
>
>
> Cheers,
> Chris
>
>
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov 
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: Jiaxin Ye >
> Reply-To: "dev@nutch.apache.org "  >
> Date: Thursday, February 12, 2015 at 12:46 AM
> To: "dev@nutch.apache.org "  >
> Subject: Re: Nutch-Selenium in Nutch 1.10
>
> >Well, good choice. I am thinking changing to ubuntu now. The thing is why
> >do we need Selenium anyway? Just easier to perform crawling?
> >
> >On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
> >> wrote:
> >
> >Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
> >using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
> >be installed properly. The issue would be I don't know how to integrate
> >Selenium with Nutch 1.10.
> >
> >On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
> >> wrote:
> >
> >Hi all,
> >
> >
> >Anyone here knows where to find the setup tutorial for Selenium on Mac ??
> >I find it difficult to install Xvfb on mac.
> >
> >
> >Best,
> >Jiaxin
> >
> >
> >On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
> >> wrote:
> >
> >Hi Shuo Li,
> >
> >
> >We were facing a similar issue. Prof. Mattman suggested we look into this
> >patch for Selenium on Nutch 1.10 :
> >https://issues.apache.org/jira/browse/NUTCH-1933.
> >
> >
> >Hope this helps!
> >
> >
> >Thanks,
> >Sapna
> >
> >On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
> >> wrote:
> >
> >Yop,
> >
> >
> >I'm trying to install selenium in Nutch 1.10. However, this error pops
> >out:
> >
> >
> >error: package org.apache.nutch.storage does not exist
> >
> >
> >
> >I can only find this package in Nutch 2.x. Is there a way to use Selenium
> >in 1.10?
> >
> >
> >Any advice would be appreciated.
> >
> >
> >Regards,
> >Shuo Li
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >--
> >Graduate Student
> >MS in CS (Data Science)
> >Viterbi School of Engineering
> >University of Southern California
> >
> >
> >Phone:
> >+1 650-307-9848 
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>


Re: Nutch-Selenium in Nutch 1.10

2015-02-12 Thread Mattmann, Chris A (3980)
You need Selenium Jiaxin, in order to crawl dynamic pages in the
polar dataset you have been assigned in my CSCI 572 search engines class.

The instructions for integrating Selenium with Nutch 1.10-trunk
are here: 

https://issues.apache.org/jira/browse/NUTCH-1933


Cheers,
Chris


++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Jiaxin Ye 
Reply-To: "dev@nutch.apache.org" 
Date: Thursday, February 12, 2015 at 12:46 AM
To: "dev@nutch.apache.org" 
Subject: Re: Nutch-Selenium in Nutch 1.10

>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>do we need Selenium anyway? Just easier to perform crawling?
>
>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
> wrote:
>
>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>be installed properly. The issue would be I don't know how to integrate
>Selenium with Nutch 1.10.
>
>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
> wrote:
>
>Hi all,
>
>
>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>I find it difficult to install Xvfb on mac.
>
>
>Best,
>Jiaxin
>
>
>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
> wrote:
>
>Hi Shuo Li,
>
>
>We were facing a similar issue. Prof. Mattman suggested we look into this
>patch for Selenium on Nutch 1.10 :
>https://issues.apache.org/jira/browse/NUTCH-1933.
>
>
>Hope this helps!
>
>
>Thanks,
>Sapna
>
>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
> wrote:
>
>Yop,
>
>
>I'm trying to install selenium in Nutch 1.10. However, this error pops
>out:
>
>
>error: package org.apache.nutch.storage does not exist
>
>
>
>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>in 1.10? 
>
>
>Any advice would be appreciated.
>
>
>Regards,
>Shuo Li
>
>
>
>
>
>
>
>
>
>
>-- 
>Graduate Student
>MS in CS (Data Science)
>Viterbi School of Engineering
>University of Southern California
>
>
>Phone: 
>+1 650-307-9848 
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



Re: Nutch-Selenium in Nutch 1.10

2015-02-12 Thread Jiaxin Ye
Well, good choice. I am thinking changing to ubuntu now. The thing is why
do we need Selenium anyway? Just easier to perform crawling?

On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li  wrote:

> Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
> using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still be
> installed properly. The issue would be I don't know how to integrate
> Selenium with Nutch 1.10.
>
> On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye  wrote:
>
>> Hi all,
>>
>> Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>> I find it difficult to install Xvfb on mac.
>>
>> Best,
>> Jiaxin
>>
>> On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh 
>> wrote:
>>
>>> Hi Shuo Li,
>>>
>>> We were facing a similar issue. Prof. Mattman suggested we look into
>>> this patch for Selenium on Nutch 1.10 :
>>> https://issues.apache.org/jira/browse/NUTCH-1933.
>>>
>>> Hope this helps!
>>>
>>> Thanks,
>>> Sapna
>>>
>>> On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li  wrote:
>>>
 Yop,

 I'm trying to install selenium in Nutch 1.10. However, this error pops
 out:

 *error: package org.apache.nutch.storage does not exist*

 I can only find this package in Nutch 2.x. Is there a way to use
 Selenium in 1.10?

 Any advice would be appreciated.

 Regards,
 Shuo Li

>>>
>>>
>>>
>>> --
>>> Graduate Student
>>> MS in CS (Data Science)
>>> Viterbi School of Engineering
>>> University of Southern California
>>>
>>> Phone: +1 650-307-9848
>>>
>>
>>
>


Re: Nutch-Selenium in Nutch 1.10

2015-02-12 Thread Shuo Li
Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still be
installed properly. The issue would be I don't know how to integrate
Selenium with Nutch 1.10.

On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye  wrote:

> Hi all,
>
> Anyone here knows where to find the setup tutorial for Selenium on Mac ??
> I find it difficult to install Xvfb on mac.
>
> Best,
> Jiaxin
>
> On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh 
> wrote:
>
>> Hi Shuo Li,
>>
>> We were facing a similar issue. Prof. Mattman suggested we look into this
>> patch for Selenium on Nutch 1.10 :
>> https://issues.apache.org/jira/browse/NUTCH-1933.
>>
>> Hope this helps!
>>
>> Thanks,
>> Sapna
>>
>> On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li  wrote:
>>
>>> Yop,
>>>
>>> I'm trying to install selenium in Nutch 1.10. However, this error pops
>>> out:
>>>
>>> *error: package org.apache.nutch.storage does not exist*
>>>
>>> I can only find this package in Nutch 2.x. Is there a way to use
>>> Selenium in 1.10?
>>>
>>> Any advice would be appreciated.
>>>
>>> Regards,
>>> Shuo Li
>>>
>>
>>
>>
>> --
>> Graduate Student
>> MS in CS (Data Science)
>> Viterbi School of Engineering
>> University of Southern California
>>
>> Phone: +1 650-307-9848
>>
>
>


Re: Nutch-Selenium in Nutch 1.10

2015-02-12 Thread Jiaxin Ye
Hi all,

Anyone here knows where to find the setup tutorial for Selenium on Mac ?? I
find it difficult to install Xvfb on mac.

Best,
Jiaxin

On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh  wrote:

> Hi Shuo Li,
>
> We were facing a similar issue. Prof. Mattman suggested we look into this
> patch for Selenium on Nutch 1.10 :
> https://issues.apache.org/jira/browse/NUTCH-1933.
>
> Hope this helps!
>
> Thanks,
> Sapna
>
> On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li  wrote:
>
>> Yop,
>>
>> I'm trying to install selenium in Nutch 1.10. However, this error pops
>> out:
>>
>> *error: package org.apache.nutch.storage does not exist*
>>
>> I can only find this package in Nutch 2.x. Is there a way to use Selenium
>> in 1.10?
>>
>> Any advice would be appreciated.
>>
>> Regards,
>> Shuo Li
>>
>
>
>
> --
> Graduate Student
> MS in CS (Data Science)
> Viterbi School of Engineering
> University of Southern California
>
> Phone: +1 650-307-9848
>


Re: Nutch-Selenium in Nutch 1.10

2015-02-10 Thread Mattmann, Chris A (3980)
Perfect, that’s what I suggested, thanks guys!

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Sapnashri Suresh 
Reply-To: "dev@nutch.apache.org" 
Date: Tuesday, February 10, 2015 at 9:42 PM
To: "dev@nutch.apache.org" 
Subject: Re: Nutch-Selenium in Nutch 1.10

>Hi Shuo Li,
>
>
>We were facing a similar issue. Prof. Mattman suggested we look into this
>patch for Selenium on Nutch 1.10 :
>https://issues.apache.org/jira/browse/NUTCH-1933.
>
>
>Hope this helps!
>
>
>Thanks,
>Sapna
>
>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
> wrote:
>
>Yop,
>
>
>I'm trying to install selenium in Nutch 1.10. However, this error pops
>out:
>
>
>error: package org.apache.nutch.storage does not exist
>
>
>
>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>in 1.10? 
>
>
>Any advice would be appreciated.
>
>
>Regards,
>Shuo Li
>
>
>
>
>
>
>
>
>-- 
>Graduate Student
>MS in CS (Data Science)
>Viterbi School of Engineering
>University of Southern California
>
>
>Phone: +1 650-307-9848
>
>
>
>
>



Re: Nutch-Selenium in Nutch 1.10

2015-02-10 Thread Sapnashri Suresh
Hi Shuo Li,

We were facing a similar issue. Prof. Mattman suggested we look into this
patch for Selenium on Nutch 1.10 :
https://issues.apache.org/jira/browse/NUTCH-1933.

Hope this helps!

Thanks,
Sapna

On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li  wrote:

> Yop,
>
> I'm trying to install selenium in Nutch 1.10. However, this error pops out:
>
> *error: package org.apache.nutch.storage does not exist*
>
> I can only find this package in Nutch 2.x. Is there a way to use Selenium
> in 1.10?
>
> Any advice would be appreciated.
>
> Regards,
> Shuo Li
>



-- 
Graduate Student
MS in CS (Data Science)
Viterbi School of Engineering
University of Southern California

Phone: +1 650-307-9848