Hey guys,

After change my RAM to 2GB, everything works fine. My bad. Thanks for your
help.

Regards,
Shuo Li

On Fri, Feb 13, 2015 at 11:34 AM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Thank you Mo. I sincerely appreciate your guidance and contribution.
>
> I will work to get your nutch selenium grid plugin contributed
> to work with Nutch 1.x.
>
> Cheers,
> Chris
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Mo Omer <beancinemat...@gmail.com>
> Date: Friday, February 13, 2015 at 11:10 AM
> To: Chris Mattmann <chris.a.mattm...@jpl.nasa.gov>
> Cc: "dev@nutch.apache.org" <dev@nutch.apache.org>
> Subject: Re: Vagrant Crushed When using Nutch-Selenium
>
> >Hey all,
> >
> >When I had run nutch-selenium, it was in a config such that zombies were
> >created from closing Firefox windows and they couldn't be reaped (again,
> >due to the docker configuration I had).
> >
> >In a normal setup, it should not be an issue - if you're running 20
> >threads in nutch that's potentially 20 open FF windows which isn't good
> >for 512mb.
> >
> >Selenium grid is much more efficient, in that browsers are opened, but
> >tabs are used to fetch sites - and only those are closed.
> >
> >Additionally, ensure you're using Nutch 2.2.1.
> >
> >Feel free to fork patch and tinker and PR as needed.
> >
> >Chris, if you want to be added to contribs on the GitHub project, that's
> >cool with me! Wish I could dedicate more time to this, but I don't
> >foresee using Nutch again in the near future, and am now working on
> >projects that require lots of reading and possibly patches to Caffe and
> >opencl r-CNN projects.
> >
> >Tl;dr:
> >- no, this shouldn't be typical unless you're creating zombies like crazy
> >and they're not being reaped (too many open file descriptors), running
> >out of memory, or similar resource constraint.
> >- selenium grid is TONs more efficient, but a bit more difficult to set
> >up. I used it to crawl 100ks of sites.
> >- unfortunately I can't commit more time to this, but if I can assist in
> >any admin way, let me know.
> >
> >Thank you,
> >
> >Mo
> >
> >This message was drafted on a tiny touch screen; please forgive brevity &
> >tpyos
> >
> >> On Feb 13, 2015, at 12:41 PM, "Mattmann, Chris A (3980)"
> >><chris.a.mattm...@jpl.nasa.gov> wrote:
> >>
> >> Oh yes, please up your memory to like at least 2Gb..
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Chief Architect
> >> Instrument Software and Science Data Systems Section (398)
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 168-519, Mailstop: 168-527
> >> Email: chris.a.mattm...@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Associate Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Shuo Li <sli...@usc.edu>
> >> Reply-To: "dev@nutch.apache.org" <dev@nutch.apache.org>
> >> Date: Friday, February 13, 2015 at 10:38 AM
> >> To: "dev@nutch.apache.org" <dev@nutch.apache.org>
> >> Cc: Mo Omer <beancinemat...@gmail.com>
> >> Subject: Re: Vagrant Crushed When using Nutch-Selenium
> >>
> >>> Hey Mo and Prof Mattmann,
> >>>
> >>>
> >>> I will try to crawl the 3 websites in the homework tonight (NASA AMD,
> >>>NSF
> >>> ACADIS and NSIDC Arctic Data Explorer). I will let you know what's
> >>>going
> >>> on.
> >>>
> >>>
> >>> Is memory an issue? My vagrant only has 512MB of memory.
> >>>
> >>>
> >>> Regards,
> >>> Shuo Li
> >>>
> >>>
> >>> On Fri, Feb 13, 2015 at 10:25 AM, Mattmann, Chris A (3980)
> >>> <chris.a.mattm...@jpl.nasa.gov> wrote:
> >>>
> >>> Hi Shuo,
> >>>
> >>> Thanks for your email. I wonder if using selenium grid would
> >>> help?
> >>>
> >>> Please see this plugin:
> >>>
> >>> https://github.com/momer/nutch-selenium-grid-plugin
> >>>
> >>>
> >>> I’m CC’ing Mo the author of the plugin to see if he experienced
> >>> this while running the original selenium plugin - Mo did using
> >>> selenium grid help the issue that Shuo is experiencing below?
> >>>
> >>> Mo: are you cool with portion the grid plugin, or if Lewis or
> >>> I do it to trunk (with full credit to you of course?)
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Chief Architect
> >>> Instrument Software and Science Data Systems Section (398)
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 168-519, Mailstop: 168-527
> >>> Email: chris.a.mattm...@nasa.gov
> >>> WWW:  http://sunset.usc.edu/~mattmann/
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Associate Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Shuo Li <sli...@usc.edu>
> >>> Reply-To: "dev@nutch.apache.org" <dev@nutch.apache.org>
> >>> Date: Friday, February 13, 2015 at 10:12 AM
> >>> To: "dev@nutch.apache.org" <dev@nutch.apache.org>
> >>> Subject: Vagrant Crushed When using Nutch-Selenium
> >>>
> >>>> Hey guys,
> >>>>
> >>>>
> >>>> I'm trying to use Nutch-Selenium to crawl
> >>>> nutch.apache.org <http://nutch.apache.org> <http://nutch.apache.org>.
> >>>> However, my vagrant seems
> >>>> crushed after a few minutes. I forced it to shut down and it turns
> >>>>out it
> >>>> only crawled 59 websites. My nutch version is 1.10 and my OS is Ubuntu
> >>>> Trusty, 14.04.
> >>>>
> >>>>
> >>>> Is there anything I can provide to you guys? Or is there anybody have
> >>>>the
> >>>> same issue? Or 59 websites is the complete crawling?
> >>>>
> >>>>
> >>>> Any suggestion would be appreciated.
> >>>>
> >>>>
> >>>> Regards,
> >>>> Shuo Li
> >>
>
>

Reply via email to