Hey guys, After change my RAM to 2GB, everything works fine. My bad. Thanks for your help.
Regards, Shuo Li On Fri, Feb 13, 2015 at 11:34 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Thank you Mo. I sincerely appreciate your guidance and contribution. > > I will work to get your nutch selenium grid plugin contributed > to work with Nutch 1.x. > > Cheers, > Chris > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: Mo Omer <beancinemat...@gmail.com> > Date: Friday, February 13, 2015 at 11:10 AM > To: Chris Mattmann <chris.a.mattm...@jpl.nasa.gov> > Cc: "dev@nutch.apache.org" <dev@nutch.apache.org> > Subject: Re: Vagrant Crushed When using Nutch-Selenium > > >Hey all, > > > >When I had run nutch-selenium, it was in a config such that zombies were > >created from closing Firefox windows and they couldn't be reaped (again, > >due to the docker configuration I had). > > > >In a normal setup, it should not be an issue - if you're running 20 > >threads in nutch that's potentially 20 open FF windows which isn't good > >for 512mb. > > > >Selenium grid is much more efficient, in that browsers are opened, but > >tabs are used to fetch sites - and only those are closed. > > > >Additionally, ensure you're using Nutch 2.2.1. > > > >Feel free to fork patch and tinker and PR as needed. > > > >Chris, if you want to be added to contribs on the GitHub project, that's > >cool with me! Wish I could dedicate more time to this, but I don't > >foresee using Nutch again in the near future, and am now working on > >projects that require lots of reading and possibly patches to Caffe and > >opencl r-CNN projects. > > > >Tl;dr: > >- no, this shouldn't be typical unless you're creating zombies like crazy > >and they're not being reaped (too many open file descriptors), running > >out of memory, or similar resource constraint. > >- selenium grid is TONs more efficient, but a bit more difficult to set > >up. I used it to crawl 100ks of sites. > >- unfortunately I can't commit more time to this, but if I can assist in > >any admin way, let me know. > > > >Thank you, > > > >Mo > > > >This message was drafted on a tiny touch screen; please forgive brevity & > >tpyos > > > >> On Feb 13, 2015, at 12:41 PM, "Mattmann, Chris A (3980)" > >><chris.a.mattm...@jpl.nasa.gov> wrote: > >> > >> Oh yes, please up your memory to like at least 2Gb.. > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Chris Mattmann, Ph.D. > >> Chief Architect > >> Instrument Software and Science Data Systems Section (398) > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 168-519, Mailstop: 168-527 > >> Email: chris.a.mattm...@nasa.gov > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Adjunct Associate Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > >> > >> > >> > >> > >> -----Original Message----- > >> From: Shuo Li <sli...@usc.edu> > >> Reply-To: "dev@nutch.apache.org" <dev@nutch.apache.org> > >> Date: Friday, February 13, 2015 at 10:38 AM > >> To: "dev@nutch.apache.org" <dev@nutch.apache.org> > >> Cc: Mo Omer <beancinemat...@gmail.com> > >> Subject: Re: Vagrant Crushed When using Nutch-Selenium > >> > >>> Hey Mo and Prof Mattmann, > >>> > >>> > >>> I will try to crawl the 3 websites in the homework tonight (NASA AMD, > >>>NSF > >>> ACADIS and NSIDC Arctic Data Explorer). I will let you know what's > >>>going > >>> on. > >>> > >>> > >>> Is memory an issue? My vagrant only has 512MB of memory. > >>> > >>> > >>> Regards, > >>> Shuo Li > >>> > >>> > >>> On Fri, Feb 13, 2015 at 10:25 AM, Mattmann, Chris A (3980) > >>> <chris.a.mattm...@jpl.nasa.gov> wrote: > >>> > >>> Hi Shuo, > >>> > >>> Thanks for your email. I wonder if using selenium grid would > >>> help? > >>> > >>> Please see this plugin: > >>> > >>> https://github.com/momer/nutch-selenium-grid-plugin > >>> > >>> > >>> I’m CC’ing Mo the author of the plugin to see if he experienced > >>> this while running the original selenium plugin - Mo did using > >>> selenium grid help the issue that Shuo is experiencing below? > >>> > >>> Mo: are you cool with portion the grid plugin, or if Lewis or > >>> I do it to trunk (with full credit to you of course?) > >>> > >>> Cheers, > >>> Chris > >>> > >>> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Chris Mattmann, Ph.D. > >>> Chief Architect > >>> Instrument Software and Science Data Systems Section (398) > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>> Office: 168-519, Mailstop: 168-527 > >>> Email: chris.a.mattm...@nasa.gov > >>> WWW: http://sunset.usc.edu/~mattmann/ > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Adjunct Associate Professor, Computer Science Department > >>> University of Southern California, Los Angeles, CA 90089 USA > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> > >>> > >>> > >>> > >>> > >>> > >>> -----Original Message----- > >>> From: Shuo Li <sli...@usc.edu> > >>> Reply-To: "dev@nutch.apache.org" <dev@nutch.apache.org> > >>> Date: Friday, February 13, 2015 at 10:12 AM > >>> To: "dev@nutch.apache.org" <dev@nutch.apache.org> > >>> Subject: Vagrant Crushed When using Nutch-Selenium > >>> > >>>> Hey guys, > >>>> > >>>> > >>>> I'm trying to use Nutch-Selenium to crawl > >>>> nutch.apache.org <http://nutch.apache.org> <http://nutch.apache.org>. > >>>> However, my vagrant seems > >>>> crushed after a few minutes. I forced it to shut down and it turns > >>>>out it > >>>> only crawled 59 websites. My nutch version is 1.10 and my OS is Ubuntu > >>>> Trusty, 14.04. > >>>> > >>>> > >>>> Is there anything I can provide to you guys? Or is there anybody have > >>>>the > >>>> same issue? Or 59 websites is the complete crawling? > >>>> > >>>> > >>>> Any suggestion would be appreciated. > >>>> > >>>> > >>>> Regards, > >>>> Shuo Li > >> > >