Re: JSP files

2003-04-03 Thread John Bresnik

Additionally - you can use a crawler to crawl your site, then index the
resulting files. Lucene comes with a crawler called LARM but the current
make file doesnt build it properly. I ended using a different crawler called
Spinx :

http://www-2.cs.cmu.edu/~rcm/websphinx/





 Pinky,

 You don't want to index the jsp directly, as you would
 be missing the content inserted by the server when the
 pages are accessed. Typically indexing dynamic pages
 is problematic since the content will change
 freqently... That being said, the java.io library
 provides classes for retrieving the content of a URL
 as an input stream. You can write a class to traverse
 your site downloading the URLS and indexing them. It
 will be slower of course than reading HTML from disk
 files.

 -Tom

 --- Pinky Iyer [EMAIL PROTECTED] wrote:
 
   Hi all!
Is there any seperate parser for jsp files. Any
  other option other than modifying indexHTML.java
  class is appreciated. I already tried modifying this
  class, html parsing is fine, but jsp parsing yields
  all the jsp tags as well in the summary...
  Thanks!
  Pinky
 
 
 
  -
  Do you Yahoo!?
  Yahoo! Tax Center - File online, calculators, forms,
  and more


 __
 Do you Yahoo!?
 Yahoo! Tax Center - File online, calculators, forms, and more
 http://tax.yahoo.com

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Anyone have experience building LARM?

2003-03-25 Thread John Bresnik
I totally unsuccessful at building it and basically gave up - if you want i
can send you the build output specifying how it failed, etc. let me know.
thanks.
- Original Message -
From: Clemens Marschner [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Sunday, March 23, 2003 9:48 AM
Subject: Re: Anyone have experience building LARM?


 There seems to be a problem for quite some time now. I'll try to figure
this
 out tomorrow evening (GMT+1), ok?


 Clemens Marschner

 - Original Message -
 From: John Bresnik [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Saturday, March 22, 2003 2:05 AM
 Subject: Anyone have experience building LARM?


  sorry this is all a little new to me, but it looks like i am getting
this
  error [amoung the 300 or so others]
 
  [javac]
 

D:\Jakarta\jakarta-lucene-sandbox\contributions\webcrawler-LARM\buid\src\HTT
  PClient\alt\HotJava\HTTPClient\HTTPResponse.java:57: duplicate class:
  TTPClient.HTTPResponse
  [javac] public class HTTPResponse implements GlobalConstants,
  HTTPClientMod
  leConstants
  [javac]^
 
 
  any ideas why i would get this? according to the docs i have to
HTTPClient
  installed [which i do]
  thanks
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



org.apache.lucene.demo.IndexHTML - parse JSP files?

2003-03-24 Thread John Bresnik
anyone know of a quick and easy way to get this demo
[org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to a
crawler to create a local [static] version of the site [i.e. they are not
longer JSP files just the html output from the original JSP file  - but in
the interest of keeping the URL intact, I need to parse the JSP extentions -
the short question is, does anyone know of a way to *not* ignore the *.jsp
files?

thanks.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: org.apache.lucene.demo.IndexHTML - parse JSP files?

2003-03-24 Thread John Bresnik
ah thanks.. i couldnt find the demo classes [turns out they were in a
different dir] - thanks.

- Original Message -
From: Michael Wechner [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, March 24, 2003 5:03 PM
Subject: Re: org.apache.lucene.demo.IndexHTML - parse JSP files?


 John Bresnik wrote:

 anyone know of a quick and easy way to get this demo
 [org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to
a
 crawler to create a local [static] version of the site [i.e. they are not
 longer JSP files just the html output from the original JSP file  - but
in
 the interest of keeping the URL intact, I need to parse the JSP
extentions -
 the short question is, does anyone know of a way to *not* ignore the
*.jsp
 files?
 

 just modify IndexHTML: there is one line in there which decides what
 extension it will index.

 HTH

 Michael

 
 thanks.
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



LARM source?

2003-03-21 Thread John Bresnik
Hello,

I have tried downloading the LARM source in the lucene-sandbox but there appears to be 
nothing there?  any suggestions [or simply emailing me the source] would be helpful. 
thanks.

John


Re: LARM source?

2003-03-21 Thread John Bresnik
got it - thanks


- Original Message - 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, March 21, 2003 2:38 PM
Subject: Re: LARM source?


 You have to get it out of CVS directly.  It is in there.
 
 Otis
 
 --- John Bresnik [EMAIL PROTECTED] wrote:
  Hello,
  
  I have tried downloading the LARM source in the lucene-sandbox but
  there appears to be nothing there?  any suggestions [or simply
  emailing me the source] would be helpful. thanks.
  
  John
  
 
 
 __
 Do you Yahoo!?
 Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
 http://platinum.yahoo.com
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Anyone have experience building LARM?

2003-03-21 Thread John Bresnik
sorry this is all a little new to me, but it looks like i am getting this
error [amoung the 300 or so others]

[javac]
D:\Jakarta\jakarta-lucene-sandbox\contributions\webcrawler-LARM\buid\src\HTT
PClient\alt\HotJava\HTTPClient\HTTPResponse.java:57: duplicate class:
TTPClient.HTTPResponse
[javac] public class HTTPResponse implements GlobalConstants,
HTTPClientMod
leConstants
[javac]^


any ideas why i would get this? according to the docs i have to HTTPClient
installed [which i do]
thanks




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]