Hi Yunhee, What are the error messages you get while running the crawler?
I've faced similar issues with crawler when I tried out the first time too. I went through the crawler user guide to understand the architecture and then understood how it worked only after running crawler with several times to ingest files. I agree we need to update the guide and if you want to know about the MetExtractorProductCrawler and AutoDetectProductCrawler, the wiki page that I mentioned before will give you an idea how to get it working (It mentions the config files that you need to write for the above two crawlers). On Thu, Aug 9, 2012 at 6:27 AM, YunHee Kang <[email protected]> wrote: > Hi Chris, > > I got a bunch of error messages when running the crawler_launcher script. > First off, I think I need to understand how to a crawler works. > Can I get some materials to help me write configuration files for > crawler_launcher ? > > Honestly I am not familiar with Crawler. > But I will try to file a JIRA issue to update the Crawler user guide. > > Thanks, > Yunhee > > > > 2012/8/9 Mattmann, Chris A (388J) <[email protected]>: > > Hi YunHee, > > > > Sorry, we need to update the docs, that is for sure. Can you help > > us remember by filing a JIRA issue to update the Crawler user > > guide and to fix the URL there? > > > > As for crawlerId, yes it's obsolete, you can find the modern > > 0.4 and 0.5-trunk options by running ./crawler_launcher -h > > > > Cheers, > > Chris > > > > On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote: > > > >> Hi Chris and Sheryl, > >> > >> I understood my mistake after modifying a wrong URL with the "/". > >> But there is the wrong URL that is used as an option of > >> crawler_launcher in the apache oodt > >> homepage(http://oodt.apache.org/components/maven/crawler/user/). > >> --filemgrUrl http://localhost:9000/ \ > >> So it made me confused. > >> > >> I tried to run the command mentioned below according to the home > >> page of apache oodt. > >> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler > >> ERROR: Invalid option: 'crawlerId' > >> > >> But the error described above was occurred. > >> Is the option 'crawlerid' obsolete ? > >> > >> Thanks, > >> Yunhee > >> > >> > >> 2012/8/7 Mattmann, Chris A (388J) <[email protected]>: > >>> Perfect, Sheryl, my thoughts exactly. > >>> > >>> Cheers, > >>> Chris > >>> > >>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote: > >>> > >>>> Hi Yunhee, > >>>> > >>>> Check out this OODT wiki for crawler : > >>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help > >>>> > >>>> Did you try giving 'http://localhost:8000' without the "/" in the > end? > >>>> Also, specify > 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory' > >>>> for 'clientTransferer' option. > >>>> > >>>> > >>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <[email protected]> > wrote: > >>>> > >>>>> Hi Chris, > >>>>> > >>>>> I got an error message when I tried to run crawler_launcher by using > a > >>>>> shell script. The error message may be caused by a wrong URL of > >>>>> filemgr. > >>>>> $ ./crawler_launcher.sh > >>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not > >>>>> allowed for option > >>>>> [longOption='filemgrUrl',shortOption='fm',description='File Manager > >>>>> URL'] - Allowed values = [http://.*:\d*] > >>>>> > >>>>> The following is the shell script that I wrote: > >>>>> $ cat crawler_launcher.sh > >>>>> #!/bin/sh > >>>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2 > >>>>> ./crawler_launcher \ > >>>>> -op --launchStdCrawler \ > >>>>> --productPath $STAGE_AREA\ > >>>>> --filemgrUrl http://localhost:8000/\ > >>>>> --failureDir /tmp \ > >>>>> --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \ > >>>>> --metFileExtension tmp \ > >>>>> --clientTransferer > >>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer > >>>>> > >>>>> I am wondering if there is a problem in the URL of the filemgr or > elsewhere > >>>>> > >>>>> Thanks, > >>>>> Yunhee > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> -Sheryl > >>> > >>> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Chris Mattmann, Ph.D. > >>> Senior Computer Scientist > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >>> Office: 171-266B, Mailstop: 171-246 > >>> Email: [email protected] > >>> WWW: http://sunset.usc.edu/~mattmann/ > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> Adjunct Assistant Professor, Computer Science Department > >>> University of Southern California, Los Angeles, CA 90089 USA > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Senior Computer Scientist > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 171-266B, Mailstop: 171-246 > > Email: [email protected] > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Adjunct Assistant Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > -- -Sheryl
