Hey Rishi, Do what Chris said AND add a CmdLineAction to cmd-line-action.xml to run you crawler (should just be a copy/paste and change some names)
-Brian On Apr 25, 2012, at 10:12 PM, "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov> wrote: > Hey Rishi, > > I think you need to change the actionRepo, something akin to this: > > 1. Edit your crawler_launcher script: > - make sure that the -Dorg.apache.oodt.cas.crawl.bean.repo is set, e.g., to > something like file:/path/to/crawler/policy/crawler-config.xml > > 2. Make sure that /path/to/crawler/policy/crawler-config.xml has the > configuration > that you are trying to override (e.g., your new bean definitions). > > HTH! > > Cheers, > Chris > > On Apr 25, 2012, at 4:36 PM, Verma, Rishi (388J) wrote: > >> Hi all, >> >> I wrote a custom cas-crawler ProductCrawler, but I'm having some difficulty >> registering my custom product crawler with cas-crawler. >> >> I created a product crawler by extending StdProductCrawler, and I've added >> this product-crawler name to crawler config files (following the example of >> StdProductCrawler): >> * crawler/policy/crawler-beans.xml >> * crawler/policy/cmd-line-option-beans.xml >> >> However, after running the below command, I can clearly see my custom >> product crawler (called LabCASProductCrawler) is not available. A crawler >> ingest try also tells me that there is no "bean" by the name of my >> "LabCASProductCrawler" available: >>> bash-3.2$ ./crawler_launcher —printSupportedCrawlers >> ProductCrawlers: >> Id: StdProductCrawler >> Id: MetExtractorProductCrawler >> Id: AutoDetectProductCrawler >> >>> ./crawler_launcher --crawlerId LabCASProductCrawler --filemgrUrl >>> http://localhost:9000 --productPath /data/staging/HGHAGA9 --failureDir >>> /tmp/failed_ingest --metFileExtension met —clientTransferer >>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory >> Failed to parse options : No bean named 'LabCASProductCrawler' is defined >> >> I noticed in files like crawler-config.xml and cmd-line-option-beans.xml, >> there were references made to crawler config files stored in the cas-crawler >> JAR. Looking more into this, it seems to me that crawler is pre-loading >> config files directly from that JAR and overshadowing any of my config >> changes: >> * crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-beans.xml >> * >> crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-config.xml >> >> So two questions: >> 1. Am I editing the correct policy files, in order to register my custom >> product crawler with cas-crawler? >> 2. It seems the cas-crawler JAR contains crawler config files that take >> greater precedence than the ones available for editing under crawler/policy. >> Is there a way around this? >> >> Thanks! >> rishi > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >