Maybe "regain" might be a solution for you? http://regain.sourceforge.net/?lang=en.
Regards Markus rhodebump wrote: > > I posted this on the lucene list a week ago and haven't heard anything, so > please don't give me the cross-post slap;) > > I am successfully using lucene in our application to index 12 different > types of objects located in a database, and their relationships to each > other to provide some nice search functionality for our website. We are > building lots of lucene queries programmatically to filter based upon > categories, regions, zip codes, scoring, long/lats... > > My problem is that there is content that is not in the database which we > have a lot of... (about 3000+ pages) that we need to also include in the > search results. It's a whole lot of jsp's. > > As I see this, I can either > a) Migrate this application to nutch > b) Write/Implement a web crawler to crawl our site and inject the crawl > results into > our lucene index. > > I am leaning towards option B, since I think it > would only take me a couple of days of implement/write a simple crawler > and > I wouldn't > have to change much else. > > Can anyone think of any points/counterpoints for using Nutch vs. writing a > crawler to extend our already used lucene framework? > > Thanks. > > > > -- View this message in context: http://www.nabble.com/Implement-crawler-with-custom-lucene-VS--use-nutch--tf3157478.html#a8804698 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
