Re: JSP files
Additionally - you can use a crawler to crawl your site, then index the resulting files. Lucene comes with a crawler called LARM but the current make file doesnt build it properly. I ended using a different crawler called Spinx : http://www-2.cs.cmu.edu/~rcm/websphinx/ Pinky, You don't want to index the jsp directly, as you would be missing the content inserted by the server when the pages are accessed. Typically indexing dynamic pages is problematic since the content will change freqently... That being said, the java.io library provides classes for retrieving the content of a URL as an input stream. You can write a class to traverse your site downloading the URLS and indexing them. It will be slower of course than reading HTML from disk files. -Tom --- Pinky Iyer [EMAIL PROTECTED] wrote: Hi all! Is there any seperate parser for jsp files. Any other option other than modifying indexHTML.java class is appreciated. I already tried modifying this class, html parsing is fine, but jsp parsing yields all the jsp tags as well in the summary... Thanks! Pinky - Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Anyone have experience building LARM?
I totally unsuccessful at building it and basically gave up - if you want i can send you the build output specifying how it failed, etc. let me know. thanks. - Original Message - From: Clemens Marschner [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Sunday, March 23, 2003 9:48 AM Subject: Re: Anyone have experience building LARM? There seems to be a problem for quite some time now. I'll try to figure this out tomorrow evening (GMT+1), ok? Clemens Marschner - Original Message - From: John Bresnik [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, March 22, 2003 2:05 AM Subject: Anyone have experience building LARM? sorry this is all a little new to me, but it looks like i am getting this error [amoung the 300 or so others] [javac] D:\Jakarta\jakarta-lucene-sandbox\contributions\webcrawler-LARM\buid\src\HTT PClient\alt\HotJava\HTTPClient\HTTPResponse.java:57: duplicate class: TTPClient.HTTPResponse [javac] public class HTTPResponse implements GlobalConstants, HTTPClientMod leConstants [javac]^ any ideas why i would get this? according to the docs i have to HTTPClient installed [which i do] thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
org.apache.lucene.demo.IndexHTML - parse JSP files?
anyone know of a quick and easy way to get this demo [org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to a crawler to create a local [static] version of the site [i.e. they are not longer JSP files just the html output from the original JSP file - but in the interest of keeping the URL intact, I need to parse the JSP extentions - the short question is, does anyone know of a way to *not* ignore the *.jsp files? thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: org.apache.lucene.demo.IndexHTML - parse JSP files?
ah thanks.. i couldnt find the demo classes [turns out they were in a different dir] - thanks. - Original Message - From: Michael Wechner [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, March 24, 2003 5:03 PM Subject: Re: org.apache.lucene.demo.IndexHTML - parse JSP files? John Bresnik wrote: anyone know of a quick and easy way to get this demo [org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to a crawler to create a local [static] version of the site [i.e. they are not longer JSP files just the html output from the original JSP file - but in the interest of keeping the URL intact, I need to parse the JSP extentions - the short question is, does anyone know of a way to *not* ignore the *.jsp files? just modify IndexHTML: there is one line in there which decides what extension it will index. HTH Michael thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
LARM source?
Hello, I have tried downloading the LARM source in the lucene-sandbox but there appears to be nothing there? any suggestions [or simply emailing me the source] would be helpful. thanks. John
Re: LARM source?
got it - thanks - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, March 21, 2003 2:38 PM Subject: Re: LARM source? You have to get it out of CVS directly. It is in there. Otis --- John Bresnik [EMAIL PROTECTED] wrote: Hello, I have tried downloading the LARM source in the lucene-sandbox but there appears to be nothing there? any suggestions [or simply emailing me the source] would be helpful. thanks. John __ Do you Yahoo!? Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop! http://platinum.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Anyone have experience building LARM?
sorry this is all a little new to me, but it looks like i am getting this error [amoung the 300 or so others] [javac] D:\Jakarta\jakarta-lucene-sandbox\contributions\webcrawler-LARM\buid\src\HTT PClient\alt\HotJava\HTTPClient\HTTPResponse.java:57: duplicate class: TTPClient.HTTPResponse [javac] public class HTTPResponse implements GlobalConstants, HTTPClientMod leConstants [javac]^ any ideas why i would get this? according to the docs i have to HTTPClient installed [which i do] thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]