Maybe "regain" might be a solution for you? 

http://regain.sourceforge.net/?lang=en.

Regards 
Markus


rhodebump wrote:
> 
> I posted this on the lucene list a week ago and haven't heard anything, so 
> please don't give me the cross-post slap;)
> 
> I am successfully using lucene in our application to index 12 different
> types of objects located in a database, and their relationships to each
> other to provide some nice search functionality for our website.  We are
> building lots of lucene queries programmatically to filter based upon
> categories, regions, zip codes, scoring, long/lats...
> 
> My problem is that there is content that is not in the database which we
> have a lot of... (about 3000+ pages) that we need to also include in the
> search results.  It's a whole lot of jsp's.
> 
> As I see this, I can either
> a) Migrate this application to nutch
> b) Write/Implement a web crawler to crawl our site and inject the crawl 
> results into
> our lucene index.
> 
> I am leaning towards option B, since I think it
> would only take me a couple of days of implement/write a simple crawler
> and 
> I wouldn't
> have to change much else.
> 
> Can anyone think of any points/counterpoints for using Nutch vs. writing a
> crawler to extend our already used lucene framework?
> 
> Thanks. 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Implement-crawler-with-custom-lucene-VS--use-nutch--tf3157478.html#a8804698
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to