Hey Rishi,

Do what Chris said AND add a CmdLineAction to cmd-line-action.xml to run you 
crawler (should just be a copy/paste and change some names)

-Brian

On Apr 25, 2012, at 10:12 PM, "Mattmann, Chris A (388J)" 
<chris.a.mattm...@jpl.nasa.gov> wrote:

> Hey Rishi,
> 
> I think you need to change the actionRepo, something akin to this:
> 
> 1. Edit your crawler_launcher script:
> - make sure that the -Dorg.apache.oodt.cas.crawl.bean.repo is set, e.g., to 
> something like file:/path/to/crawler/policy/crawler-config.xml
> 
> 2. Make sure that /path/to/crawler/policy/crawler-config.xml has the 
> configuration
> that you are trying to override (e.g., your new bean definitions).
> 
> HTH!
> 
> Cheers,
> Chris
> 
> On Apr 25, 2012, at 4:36 PM, Verma, Rishi (388J) wrote:
> 
>> Hi all,
>> 
>> I wrote a custom cas-crawler ProductCrawler, but I'm having some difficulty 
>> registering my custom product crawler with cas-crawler.
>> 
>> I created a product crawler by extending StdProductCrawler, and I've added 
>> this product-crawler name to crawler config files (following the example of 
>> StdProductCrawler):
>> * crawler/policy/crawler-beans.xml
>> * crawler/policy/cmd-line-option-beans.xml
>> 
>> However, after running the below command, I can clearly see my custom 
>> product crawler (called LabCASProductCrawler) is not available. A crawler 
>> ingest try also tells me that there is no "bean" by the name of my 
>> "LabCASProductCrawler" available:
>>> bash-3.2$ ./crawler_launcher —printSupportedCrawlers
>> ProductCrawlers:
>> Id: StdProductCrawler
>> Id: MetExtractorProductCrawler
>> Id: AutoDetectProductCrawler
>> 
>>> ./crawler_launcher --crawlerId LabCASProductCrawler --filemgrUrl 
>>> http://localhost:9000 --productPath /data/staging/HGHAGA9 --failureDir 
>>> /tmp/failed_ingest --metFileExtension met —clientTransferer 
>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
>> Failed to parse options : No bean named 'LabCASProductCrawler' is defined
>> 
>> I noticed in files like crawler-config.xml and cmd-line-option-beans.xml, 
>> there were references made to crawler config files stored in the cas-crawler 
>> JAR. Looking more into this, it seems to me that crawler is pre-loading 
>> config files directly from that JAR and overshadowing any of my config 
>> changes:
>> * crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-beans.xml
>> * 
>> crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-config.xml
>> 
>> So two questions:
>> 1. Am I editing the correct policy files, in order to register my custom 
>> product crawler with cas-crawler?
>> 2. It seems the cas-crawler JAR contains crawler config files that take 
>> greater precedence than the ones available for editing under crawler/policy. 
>> Is there a way around this?
>> 
>> Thanks!
>> rishi
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 

Reply via email to