Hi Yunhee,
On Thu, Aug 9, 2012 at 8:19 PM, YunHee Kang <yunh.k...@gmail.com> wrote: > Hi Sheryl, > > First off, I tried to run crawler_launcher with an option "-autoPC". > Then I got a warning message as follows: > Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler > handleFile > WARNING: Failed to pass preconditions for ingest of product: > > [/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5] > Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler > handleFile > INFO: Handling file > > /home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp > Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler > handleFile > WARNING: Failed to pass preconditions for ingest of product: > > [/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp] > > I think that the warning message is related with preconditions for ingest. > According to the run script for crawler_launcher, it was wrong to > describe the option "pids" for the preconditions. > #!/bin/sh > export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2 > ./crawler_launcher \ > -op -stdPC \ > -mfx tmp\ > --productPath $STAGE_AREA\ > --filemgrUrl http://localhost:8000\ > --failureDir /tmp \ > --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \ > --metFileExtension tmp \ > -pids CheckThatDataFileSizeIsGreaterThanZero \ > --clientTransferer > org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory > Let me know how to fix the warning. > > I see that your data file is *.he5 and the metadata file is *.he5.info.tmp. Specify your '-mfx' option as 'info.tmp' StdProductCrawler adds your met file extension to the absolute path of the data file. Try that and see if it ingests the data file. I should have noticed this before, but I only caught it after testing it out. Next I appied an option for metadata crawler to the run script. > #!/bin/sh > export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2 > ./crawler_launcher \ > -op -metPC\ > -pp $STAGE_AREA\ > -fm http://localhost:8000\ > -mxc ../policy/crawler-config.xml\ > -mx org.apache.oodt.cas.metadata.extractors.ExternMetExtractor\ > -mxr ../policy/mime-extractor-map.xml\ > --failureDir /tmp \ > --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \ > --metFileExtension tmp \ > --clientTransferer > org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory > > I also get the error message as follows: > > ERROR: Failed to launch crawler : Error creating bean with name > 'MetExtractorProductCrawler' defined in file > > [/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/bin/../policy/crawler-beans.xml]: > Error setting property values; nested exception is > org.springframework.beans.PropertyBatchUpdateException; nested > PropertyAccessExceptions (1) are: > PropertyAccessException 1: > org.springframework.beans.MethodInvocationException: Property > 'metExtractor' threw exception; nested exception is > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Failed > to parse config file : Failed to parser > '/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/policy/crawler-config.xml' > : null > > I just used the property file crawler-config.xml (as follows) in the > policy directory. > > <beans xmlns="http://www.springframework.org/schema/beans" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xmlns:p="http://www.springframework.org/schema/p" > xsi:schemaLocation="http://www.springframework.org/schema/beans > http://www.springframework.org/schema/beans/spring-beans-2.5.xsd"> > <bean > class="org.apache.oodt.cas.crawl.util.CasPropertyOverrideConfigurer" > /> > <import resource="crawler-beans.xml" /> > <import resource="action-beans.xml" /> > <import resource="precondition-beans.xml" /> > <import resource="naming-beans.xml" /> > </beans> > > Your metextractor config (-mxc option) should be a config file for your external meta-extractor and will look like this : https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml The crawler-config.xml is used by the crawler-launcher to read all the actions, precondition etc. I've not defined or used an external-met extractor before, but you can see an example of an extern met-extractor and it's config in the wiki: https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help So I need to understand how to write some xml files(including > crawler-beans.xml, action-beans.xml, etc), which are imported into the > file crawler-config.xml . > Could you share your experience with me ? > Thanks, > Yunhee > > Yep, you should write the above mentioned extractor config file for your specific external met-extractor. But, you don't have to write crawler-beans or action-beans. You can just pick the actions ids you want in the crawler-launcher cli '-actionIds or -ais' option and you can see these listed in the action-beans.xml. The same applies for the crawler-beans and the preconditions. 2012/8/10 Sheryl John <shery...@gmail.com>: > > Hi Yunhee, > > > > What are the error messages you get while running the crawler? > > > > I've faced similar issues with crawler when I tried out the first time > too. > > I went through the crawler user guide to understand the architecture and > > then understood how it worked only after running crawler with several > times > > to ingest files. > > I agree we need to update the guide and if you want to know about the > > MetExtractorProductCrawler and AutoDetectProductCrawler, the wiki page > that > > I mentioned before will give you an idea how to get it working (It > mentions > > the config files that you need to write for the above two crawlers). > > > > > > > > On Thu, Aug 9, 2012 at 6:27 AM, YunHee Kang <yunh.k...@gmail.com> wrote: > > > >> Hi Chris, > >> > >> I got a bunch of error messages when running the crawler_launcher > script. > >> First off, I think I need to understand how to a crawler works. > >> Can I get some materials to help me write configuration files for > >> crawler_launcher ? > >> > >> Honestly I am not familiar with Crawler. > >> But I will try to file a JIRA issue to update the Crawler user guide. > >> > >> Thanks, > >> Yunhee > >> > >> > >> > >> 2012/8/9 Mattmann, Chris A (388J) <chris.a.mattm...@jpl.nasa.gov>: > >> > Hi YunHee, > >> > > >> > Sorry, we need to update the docs, that is for sure. Can you help > >> > us remember by filing a JIRA issue to update the Crawler user > >> > guide and to fix the URL there? > >> > > >> > As for crawlerId, yes it's obsolete, you can find the modern > >> > 0.4 and 0.5-trunk options by running ./crawler_launcher -h > >> > > >> > Cheers, > >> > Chris > >> > > >> > On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote: > >> > > >> >> Hi Chris and Sheryl, > >> >> > >> >> I understood my mistake after modifying a wrong URL with the "/". > >> >> But there is the wrong URL that is used as an option of > >> >> crawler_launcher in the apache oodt > >> >> homepage(http://oodt.apache.org/components/maven/crawler/user/). > >> >> --filemgrUrl http://localhost:9000/ \ > >> >> So it made me confused. > >> >> > >> >> I tried to run the command mentioned below according to the home > >> >> page of apache oodt. > >> >> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler > >> >> ERROR: Invalid option: 'crawlerId' > >> >> > >> >> But the error described above was occurred. > >> >> Is the option 'crawlerid' obsolete ? > >> >> > >> >> Thanks, > >> >> Yunhee > >> >> > >> >> > >> >> 2012/8/7 Mattmann, Chris A (388J) <chris.a.mattm...@jpl.nasa.gov>: > >> >>> Perfect, Sheryl, my thoughts exactly. > >> >>> > >> >>> Cheers, > >> >>> Chris > >> >>> > >> >>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote: > >> >>> > >> >>>> Hi Yunhee, > >> >>>> > >> >>>> Check out this OODT wiki for crawler : > >> >>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help > >> >>>> > >> >>>> Did you try giving 'http://localhost:8000' without the "/" in the > >> end? > >> >>>> Also, specify > >> 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory' > >> >>>> for 'clientTransferer' option. > >> >>>> > >> >>>> > >> >>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yunh.k...@gmail.com> > >> wrote: > >> >>>> > >> >>>>> Hi Chris, > >> >>>>> > >> >>>>> I got an error message when I tried to run crawler_launcher by > using > >> a > >> >>>>> shell script. The error message may be caused by a wrong URL of > >> >>>>> filemgr. > >> >>>>> $ ./crawler_launcher.sh > >> >>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is > not > >> >>>>> allowed for option > >> >>>>> [longOption='filemgrUrl',shortOption='fm',description='File > Manager > >> >>>>> URL'] - Allowed values = [http://.*:\d*] > >> >>>>> > >> >>>>> The following is the shell script that I wrote: > >> >>>>> $ cat crawler_launcher.sh > >> >>>>> #!/bin/sh > >> >>>>> export > STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2 > >> >>>>> ./crawler_launcher \ > >> >>>>> -op --launchStdCrawler \ > >> >>>>> --productPath $STAGE_AREA\ > >> >>>>> --filemgrUrl http://localhost:8000/\ > >> >>>>> --failureDir /tmp \ > >> >>>>> --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \ > >> >>>>> --metFileExtension tmp \ > >> >>>>> --clientTransferer > >> >>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer > >> >>>>> > >> >>>>> I am wondering if there is a problem in the URL of the filemgr or > >> elsewhere > >> >>>>> > >> >>>>> Thanks, > >> >>>>> Yunhee > >> >>>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> -Sheryl > >> >>> > >> >>> > >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> Chris Mattmann, Ph.D. > >> >>> Senior Computer Scientist > >> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >>> Office: 171-266B, Mailstop: 171-246 > >> >>> Email: chris.a.mattm...@nasa.gov > >> >>> WWW: http://sunset.usc.edu/~mattmann/ > >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> Adjunct Assistant Professor, Computer Science Department > >> >>> University of Southern California, Los Angeles, CA 90089 USA > >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > >> > > >> > > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > Chris Mattmann, Ph.D. > >> > Senior Computer Scientist > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> > Office: 171-266B, Mailstop: 171-246 > >> > Email: chris.a.mattm...@nasa.gov > >> > WWW: http://sunset.usc.edu/~mattmann/ > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > Adjunct Assistant Professor, Computer Science Department > >> > University of Southern California, Los Angeles, CA 90089 USA > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > > >> > > > > > > > > -- > > -Sheryl > -- -Sheryl