Nevermind... Looks like you are using 0.3 instead of the trunk... what I added 
applies to trunk crawler

-Brian

On Apr 25, 2012, at 4:36 PM, "Verma, Rishi (388J)" <rishi.ve...@jpl.nasa.gov> 
wrote:

> Hi all,
> 
> I wrote a custom cas-crawler ProductCrawler, but I'm having some difficulty 
> registering my custom product crawler with cas-crawler.
> 
> I created a product crawler by extending StdProductCrawler, and I've added 
> this product-crawler name to crawler config files (following the example of 
> StdProductCrawler):
> * crawler/policy/crawler-beans.xml
> * crawler/policy/cmd-line-option-beans.xml
> 
> However, after running the below command, I can clearly see my custom product 
> crawler (called LabCASProductCrawler) is not available. A crawler ingest try 
> also tells me that there is no "bean" by the name of my 
> "LabCASProductCrawler" available:
>> bash-3.2$ ./crawler_launcher —printSupportedCrawlers
> ProductCrawlers:
>  Id: StdProductCrawler
>  Id: MetExtractorProductCrawler
>  Id: AutoDetectProductCrawler
> 
>> ./crawler_launcher --crawlerId LabCASProductCrawler --filemgrUrl 
>> http://localhost:9000 --productPath /data/staging/HGHAGA9 --failureDir 
>> /tmp/failed_ingest --metFileExtension met —clientTransferer 
>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
> Failed to parse options : No bean named 'LabCASProductCrawler' is defined
> 
> I noticed in files like crawler-config.xml and cmd-line-option-beans.xml, 
> there were references made to crawler config files stored in the cas-crawler 
> JAR. Looking more into this, it seems to me that crawler is pre-loading 
> config files directly from that JAR and overshadowing any of my config 
> changes:
> * crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-beans.xml
> * crawler/lib/cas-crawler-0.3.jar:org/apache/oodt/cas/crawl/crawler-config.xml
> 
> So two questions:
> 1. Am I editing the correct policy files, in order to register my custom 
> product crawler with cas-crawler?
> 2. It seems the cas-crawler JAR contains crawler config files that take 
> greater precedence than the ones available for editing under crawler/policy. 
> Is there a way around this?
> 
> Thanks!
> rishi

Reply via email to