Re: I can't make index for .jsp files with IndexHtml!

2004-12-17 Thread Benson Fang
On Sun, 5 Dec 2004 03:12:37 -0500, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
> On Dec 5, 2004, at 1:36 AM, Benson Fang wrote:
> IndexHTML has this in its source code:
> 
> if (file.getPath().endsWith(".html") || // index .html files
>file.getPath().endsWith(".htm") || // index .htm files
>file.getPath().endsWith(".txt"))...
> 
> This is why .jsp files are excluded.
> 
> Keep in mind that JSP files are not pure HTML (not even close in most
> cases, especially when using JSF).  I'm not sure what the built-in HTML
> parser will do with .jsp files.  Generally content shouldn't be in the
> view layer, so indexing .jsp pages may or may not be useful.  May I
> suggest switching to a Java web development framework that uses pure
> HTML templates?  Tapestry - http://jakarta.apache.org/tapestry :)
> 
> Erik

Oh, I see

However, for some reason, I have to make index for .jsp files. In
fact, I've rewritten IndexHTML and compiled a new class file, but I
don't know how to substitute it for the old one. Is there any way that
I can still make index for .jsp?

Thank you very much!

Benson

-- 
ankh  wdj3  snb 

  "life, prosperity, health"

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: I can't make index for .jsp files with IndexHtml!

2004-12-05 Thread Erik Hatcher
On Dec 5, 2004, at 1:36 AM, Benson Fang wrote:
I got a problem: Everytime I try to make an index with IndexHtml for a
directory under Tomcat webapps which includes .html and .jsp files, it
only scans for .html files, but not .jsp files. However, it works
while I make index with IndexFiles.
IndexHTML has this in its source code:
if (file.getPath().endsWith(".html") || // index .html files
   file.getPath().endsWith(".htm") || // index .htm files
   file.getPath().endsWith(".txt"))...
This is why .jsp files are excluded.
Keep in mind that JSP files are not pure HTML (not even close in most 
cases, especially when using JSF).  I'm not sure what the built-in HTML 
parser will do with .jsp files.  Generally content shouldn't be in the 
view layer, so indexing .jsp pages may or may not be useful.  May I 
suggest switching to a Java web development framework that uses pure 
HTML templates?  Tapestry - http://jakarta.apache.org/tapestry :)

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


I can't make index for .jsp files with IndexHtml!

2004-12-04 Thread Benson Fang
Hail everyone,

I am Benson, and this is the first time I raise question via Lucene Users List.

I got a problem: Everytime I try to make an index with IndexHtml for a
directory under Tomcat webapps which includes .html and .jsp files, it
only scans for .html files, but not .jsp files. However, it works
while I make index with IndexFiles.

My command under DOS is as following:
java org.apache.lucene.demo.IndexHTML -create -index /opt/lucene/index
../SampleDirectory
(where 'SampleDirectory' is the name of my target directory)

Thank you!

Benson Fang
-- 
ankh  wdj3  snb 

  "life, prosperity, health"

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: How can I index JSP files?

2003-07-28 Thread Pitre, Russell
I think this may be exactly what i'm looking for!


Thanx a lot

Russs


I'll let you know how it works outthanx again!





-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 28, 2003 6:56 AM
To: Lucene Users List
Subject: Re: How can I index JSP files?


You could try using a spider such as Spindle.  Don't have the URL, but
I'm sure you can find it via Google.  Spindle uses Lucene.

Otis

--- "Pitre, Russell" <[EMAIL PROTECTED]> wrote:
> Reffering to this:  http://www.jguru.com/faq/view.jsp?EID=1074516
> 
>  
> 
>  
> 
> "To index the content of JSPs that a user would see using a Web 
> browser, you would need to write an application that acts as a Web 
> client, in order to mimic the Web browser behaviour. Once you have 
> such an application, you should be able to point it to the desired 
> JSP, retrieve
> the contents that the JSP generates, parse it, and feed it to
> Lucene."
> 
>  
> 
>  
> 
>  
> 
> I am a newbie to lucene and I would like to enable searching 
> capability to my website which is written entirely with JSP and 
> servlets.  Does anyone have any experience parsing JSP files in order 
> to create in index
> for/by Lucene?   I would greatly appreciate any help with the matter.
> THanx
> 
>  
> 
> Russ
> 
> 


__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How can I index JSP files?

2003-07-28 Thread Otis Gospodnetic
You could try using a spider such as Spindle.  Don't have the URL, but
I'm sure you can find it via Google.  Spindle uses Lucene.

Otis

--- "Pitre, Russell" <[EMAIL PROTECTED]> wrote:
> Reffering to this:  http://www.jguru.com/faq/view.jsp?EID=1074516
> 
>  
> 
>  
> 
> "To index the content of JSPs that a user would see using a Web
> browser,
> you would need to write an application that acts as a Web client, in
> order to mimic the Web browser behaviour. Once you have such an
> application, you should be able to point it to the desired JSP,
> retrieve
> the contents that the JSP generates, parse it, and feed it to
> Lucene."
> 
>  
> 
>  
> 
>  
> 
> I am a newbie to lucene and I would like to enable searching
> capability
> to my website which is written entirely with JSP and servlets.  Does
> anyone have any experience parsing JSP files in order to create in
> index
> for/by Lucene?   I would greatly appreciate any help with the matter.
> THanx
> 
>  
> 
> Russ
> 
> 


__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How can I index JSP files?

2003-07-27 Thread Leo Galambos
If I understand the Enigma code well, they say, that you must write a 
crawler ;-)

-g-

"To index the content of JSPs that a user would see using a Web browser,
you would need to write an application that acts as a Web client, in
order to mimic the Web browser behaviour. Once you have such an
application, you should be able to point it to the desired JSP, retrieve
the contents that the JSP generates, parse it, and feed it to Lucene."






I am a newbie to lucene and I would like to enable searching capability
to my website which is written entirely with JSP and servlets.  Does
anyone have any experience parsing JSP files in order to create in index
for/by Lucene?   I would greatly appreciate any help with the matter.
THanx


Russ

 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


How can I index JSP files?

2003-07-27 Thread Pitre, Russell
Reffering to this:  http://www.jguru.com/faq/view.jsp?EID=1074516

 

 

"To index the content of JSPs that a user would see using a Web browser,
you would need to write an application that acts as a Web client, in
order to mimic the Web browser behaviour. Once you have such an
application, you should be able to point it to the desired JSP, retrieve
the contents that the JSP generates, parse it, and feed it to Lucene."

 

 

 

I am a newbie to lucene and I would like to enable searching capability
to my website which is written entirely with JSP and servlets.  Does
anyone have any experience parsing JSP files in order to create in index
for/by Lucene?   I would greatly appreciate any help with the matter.
THanx

 

Russ



Re: JSP files

2003-04-03 Thread John Bresnik

Additionally - you can use a crawler to crawl your site, then index the
resulting files. Lucene comes with a crawler called "LARM" but the current
make file doesnt build it properly. I ended using a different crawler called
Spinx :

http://www-2.cs.cmu.edu/~rcm/websphinx/





> Pinky,
>
> You don't want to index the jsp directly, as you would
> be missing the content inserted by the server when the
> pages are accessed. Typically indexing dynamic pages
> is problematic since the content will change
> freqently... That being said, the java.io library
> provides classes for retrieving the content of a URL
> as an input stream. You can write a class to traverse
> your site downloading the URLS and indexing them. It
> will be slower of course than reading HTML from disk
> files.
>
> -Tom
>
> --- Pinky Iyer <[EMAIL PROTECTED]> wrote:
> >
> >  Hi all!
> >   Is there any seperate parser for jsp files. Any
> > other option other than modifying indexHTML.java
> > class is appreciated. I already tried modifying this
> > class, html parsing is fine, but jsp parsing yields
> > all the jsp tags as well in the summary...
> > Thanks!
> > Pinky
> >
> >
> >
> > -
> > Do you Yahoo!?
> > Yahoo! Tax Center - File online, calculators, forms,
> > and more
>
>
> __
> Do you Yahoo!?
> Yahoo! Tax Center - File online, calculators, forms, and more
> http://tax.yahoo.com
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: JSP files

2003-04-02 Thread Tomcat Programmer
Pinky,

You don't want to index the jsp directly, as you would
be missing the content inserted by the server when the
pages are accessed. Typically indexing dynamic pages
is problematic since the content will change
freqently... That being said, the java.io library
provides classes for retrieving the content of a URL
as an input stream. You can write a class to traverse
your site downloading the URLS and indexing them. It
will be slower of course than reading HTML from disk
files.  

-Tom 

--- Pinky Iyer <[EMAIL PROTECTED]> wrote:
> 
>  Hi all!
>   Is there any seperate parser for jsp files. Any
> other option other than modifying indexHTML.java
> class is appreciated. I already tried modifying this
> class, html parsing is fine, but jsp parsing yields
> all the jsp tags as well in the summary...
> Thanks!
> Pinky
> 
> 
> 
> -
> Do you Yahoo!?
> Yahoo! Tax Center - File online, calculators, forms,
> and more


__
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



JSP files

2003-04-02 Thread Pinky Iyer

 Hi all!
  Is there any seperate parser for jsp files. Any other option other than modifying 
indexHTML.java class is appreciated. I already tried modifying this class, html 
parsing is fine, but jsp parsing yields all the jsp tags as well in the summary...
Thanks!
Pinky



-
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more

RE: org.apache.lucene.demo.IndexHTML - parse JSP files?

2003-03-25 Thread MMachado
Maybe you need to write for example a jsp for the search interface, another
jsp that take the word that you search and this second jsp page goes
directly to a bean with lucene that will do the job. 
Michel 

-Original Message-
From: Tatu Saloranta [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 25, 2003 3:46 PM
To: Lucene Users List
Subject: Re: org.apache.lucene.demo.IndexHTML - parse JSP files?

On Monday 24 March 2003 18:03, Michael Wechner wrote:
> John Bresnik wrote:
> >anyone know of a quick and easy way to get this demo
> >[org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to
a
> >crawler to create a local [static] version of the site [i.e. they are not
> >longer "JSP" files just the html output from the original JSP file  - but
> > in the interest of keeping the URL intact, I need to parse the JSP
> > extentions - the short question is, does anyone know of a way to *not*
> > ignore the *.jsp files?
>
> just modify IndexHTML: there is one line in there which decides what
> extension it will index.

There is another question I was wondering; since JSP is not XML (ie. can not

be reliably parse using an XML or even HTML parser [or for that matter, even

with simplest XML markup tokenizer that ignores nesting], needs a lower
level 
scanner), has anyone tried connecting an actual JSP processor to Lucene? Or 
writing a simple one just meant for indexing, without having to execute code

embedded?
[the problem with JSP compared to XML is that it need not nest properly with

HTML content around; one can use JSP inside attribute values, for example; 
thus, first JSP has to be processed to HTML, and then HTML needs to be 
further tokenized]

Jakarta has to have at least one such processor (haven't looked at whether 
there's a separate component or if Tomcat just has one embedded?). Of course

parsing JSP is problematic in many ways, not just getting jsp tagging out; 
dynamic portions probably just have to be ignored, and all text inside 
included (except for things inside comments).

-+ Tatu +-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: org.apache.lucene.demo.IndexHTML - parse JSP files?

2003-03-25 Thread Tatu Saloranta
On Monday 24 March 2003 18:03, Michael Wechner wrote:
> John Bresnik wrote:
> >anyone know of a quick and easy way to get this demo
> >[org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to a
> >crawler to create a local [static] version of the site [i.e. they are not
> >longer "JSP" files just the html output from the original JSP file  - but
> > in the interest of keeping the URL intact, I need to parse the JSP
> > extentions - the short question is, does anyone know of a way to *not*
> > ignore the *.jsp files?
>
> just modify IndexHTML: there is one line in there which decides what
> extension it will index.

There is another question I was wondering; since JSP is not XML (ie. can not 
be reliably parse using an XML or even HTML parser [or for that matter, even 
with simplest XML markup tokenizer that ignores nesting], needs a lower level 
scanner), has anyone tried connecting an actual JSP processor to Lucene? Or 
writing a simple one just meant for indexing, without having to execute code 
embedded?
[the problem with JSP compared to XML is that it need not nest properly with 
HTML content around; one can use JSP inside attribute values, for example; 
thus, first JSP has to be processed to HTML, and then HTML needs to be 
further tokenized]

Jakarta has to have at least one such processor (haven't looked at whether 
there's a separate component or if Tomcat just has one embedded?). Of course 
parsing JSP is problematic in many ways, not just getting jsp tagging out; 
dynamic portions probably just have to be ignored, and all text inside 
included (except for things inside comments).

-+ Tatu +-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: org.apache.lucene.demo.IndexHTML - parse JSP files?

2003-03-24 Thread John Bresnik
ah thanks.. i couldnt find the demo classes [turns out they were in a
different dir] - thanks.

- Original Message -
From: "Michael Wechner" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, March 24, 2003 5:03 PM
Subject: Re: org.apache.lucene.demo.IndexHTML - parse JSP files?


> John Bresnik wrote:
>
> >anyone know of a quick and easy way to get this demo
> >[org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to
a
> >crawler to create a local [static] version of the site [i.e. they are not
> >longer "JSP" files just the html output from the original JSP file  - but
in
> >the interest of keeping the URL intact, I need to parse the JSP
extentions -
> >the short question is, does anyone know of a way to *not* ignore the
*.jsp
> >files?
> >
>
> just modify IndexHTML: there is one line in there which decides what
> extension it will index.
>
> HTH
>
> Michael
>
> >
> >thanks.
> >
> >
> >
> >-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> >
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: org.apache.lucene.demo.IndexHTML - parse JSP files?

2003-03-24 Thread Michael Wechner
John Bresnik wrote:

anyone know of a quick and easy way to get this demo
[org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to a
crawler to create a local [static] version of the site [i.e. they are not
longer "JSP" files just the html output from the original JSP file  - but in
the interest of keeping the URL intact, I need to parse the JSP extentions -
the short question is, does anyone know of a way to *not* ignore the *.jsp
files?
just modify IndexHTML: there is one line in there which decides what 
extension it will index.

HTH

Michael

thanks.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


org.apache.lucene.demo.IndexHTML - parse JSP files?

2003-03-24 Thread John Bresnik
anyone know of a quick and easy way to get this demo
[org.apache.lucene.demo.IndexHTML] to parse JSP files as well? I used to a
crawler to create a local [static] version of the site [i.e. they are not
longer "JSP" files just the html output from the original JSP file  - but in
the interest of keeping the URL intact, I need to parse the JSP extentions -
the short question is, does anyone know of a way to *not* ignore the *.jsp
files?

thanks.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: how to search .pdf, .doc, .jsp files using LUCENE ?

2003-02-20 Thread Borkenhagen, Michael (ofd-ko zdfin)
Hi,

this is an FAQ ...

-Ursprüngliche Nachricht-
Von: Naraharasetty Ravi Kumar [mailto:[EMAIL PROTECTED]]
Gesendet: Donnerstag, 20. Februar 2003 17:38
An: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Betreff: how to search .pdf, .doc, .jsp files using LUCENE ?


Hi All

I have used the lucene's demo web application. I am developing a website for
which I use jsp,servlets,java files. I
need to implement a search engine in my site and for that I am using
LUCENE. I implemented the search using LUCENE and so could search
.html,.txt files but how to search .pdf, .doc, .jsp etc. and use LUCENE in
this context ?

Basically, how I implemented the search on .html, .txt files is, I created
the index document using below
command prompt instructions:

C:\DarrenWebsite\Lucene>java
org.apache.lucene.demo.IndexHTML -create -index Index
"..\html"
adding ../html/AboutUs/aboutus.htm
adding ../html/AboutUs/milestones.htm
adding ../html/AboutUs/ourmethodology.htm
adding ../html/AboutUs/ourmission.htm
adding ../html/AboutUs/ourpeople.htm
adding ../html/AboutUs/ourwork.htm
adding ../html/Careers/careerpath.htm
adding ../html/Careers/careers.htm
adding ../html/Careers/opportunities.htm
adding ../html/Clients/clients.htm
adding ../html/ContactUs/contactus.htm
adding ../html/Home/legaldisclaim.htm
adding ../html/Home/sitemap.htm
adding ../html/Images/Menu/menu.htm
adding ../html/Partners/partners.htm
adding ../html/Products/edynamo.htm
adding ../html/Products/packaged_edapters.htm
adding ../html/Products/products.htm
adding ../html/Services/bc.htm
adding ../html/Services/crm.htm
adding ../html/Services/eai.htm
adding ../html/Services/erp.htm
adding ../html/Services/scm.htm
adding ../html/Services/services.htm
adding ../html/a.txt
adding ../html/b.txt
Optimizing index...
2625 total milliseconds




And in my configuration.jsp I have below entry:
String indexLocation = "/DarrenWebsite/Lucene/Index";



That is all I did to implement search on .html, .txt
file and now how do I implement search on .pdf, .doc, .jsp
etc. ??




Thanks & Regards,
Ravi.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




how to search .pdf, .doc, .jsp files using LUCENE ?

2003-02-20 Thread Naraharasetty Ravi Kumar
Hi All

I have used the lucene's demo web application. I am developing a website for which I 
use jsp,servlets,java files. I
need to implement a search engine in my site and for that I am using
LUCENE. I implemented the search using LUCENE and so could search
.html,.txt files but how to search .pdf, .doc, .jsp etc. and use LUCENE in
this context ?

Basically, how I implemented the search on .html, .txt files is, I created the index 
document using below
command prompt instructions:

C:\DarrenWebsite\Lucene>java
org.apache.lucene.demo.IndexHTML -create -index Index
"..\html"
adding ../html/AboutUs/aboutus.htm
adding ../html/AboutUs/milestones.htm
adding ../html/AboutUs/ourmethodology.htm
adding ../html/AboutUs/ourmission.htm
adding ../html/AboutUs/ourpeople.htm
adding ../html/AboutUs/ourwork.htm
adding ../html/Careers/careerpath.htm
adding ../html/Careers/careers.htm
adding ../html/Careers/opportunities.htm
adding ../html/Clients/clients.htm
adding ../html/ContactUs/contactus.htm
adding ../html/Home/legaldisclaim.htm
adding ../html/Home/sitemap.htm
adding ../html/Images/Menu/menu.htm
adding ../html/Partners/partners.htm
adding ../html/Products/edynamo.htm
adding ../html/Products/packaged_edapters.htm
adding ../html/Products/products.htm
adding ../html/Services/bc.htm
adding ../html/Services/crm.htm
adding ../html/Services/eai.htm
adding ../html/Services/erp.htm
adding ../html/Services/scm.htm
adding ../html/Services/services.htm
adding ../html/a.txt
adding ../html/b.txt
Optimizing index...
2625 total milliseconds




And in my configuration.jsp I have below entry:
String indexLocation = "/DarrenWebsite/Lucene/Index";



That is all I did to implement search on .html, .txt
file and now how do I implement search on .pdf, .doc, .jsp
etc. ??




Thanks & Regards,
Ravi.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: summary text for indexed jsp files -- modify the HTMLParser.jj

2002-08-14 Thread Jon Wasson

My bad... you would have to redo this line in the file
indexHTML.java...

else if (file.getPath().endsWith(".html") || // index
.html files
   file.getPath().endsWith(".htm") || // index
.htm files


--- Jon Wasson <[EMAIL PROTECTED]> wrote:
> Karen - 
> There is probably a simpler solution... why parse
> the
> local .jsp file at all?  Grab it with a webcrawler
> and
> write it locally to your disk in a temp directory. 
> Then parse the file with with its content as though
> it
> were a complete HTML file.  This way instead of
> dealing with files by extension
> (.jsp,.html,.asp,.etc), you can deal with them by
> mime
> type, thus lumping them all together.  This can take
> care of messy issues like how to get at Lotus Notes
> documents (without going through the client) or how
> to
> parse an .asp file vs a .jsp file.  And you can use
> the standard indexHTML file as it currently is. Just
> a
> suggestion.
> 
> Jon
> 
> 
> 
> 
> 
>   karen bran <[EMAIL PROTECTED]>
>   08/12/2002 04:28 PM
>   Please respond to "Lucene Users List"
>    
>To: [EMAIL PROTECTED]
>cc: 
>        Subject: summary text for indexed jsp files --
> modify the HTMLParser.jj
> 
> 
> 
> Hello,
> 
> I modified the IndexHTML.java and let the jsp files
> be
> indexed, but the source code of the jsp tags such as
> <%@page import... shows up in the result
> summary.
> 
> I checked this mailing list messages, someone
> suggested to modify the HTMLParser.jj file to make
> the
> jsp tag text as the 3rd comment. Since I am not
> familiar with the Javacc grammar, I don't know how
> to
> hack the HTMLParser.jj and insert in the 3rd comment
> tag for the jsp tag.
> 
> here is the 2 existing comment tags in the
> HTMLParser.jj,  can someone help me to figure out
> how
> to add the 3rd one ??? 
> 
> Thanks a lot.
> 
>  
> 
>  TOKEN :
> {
>   < CommentText1:  (~["-"])+ | "-" >
> | < CommentEnd1:   "-->" > : DEFAULT
> }
> 
>  TOKEN :
> {
>   < CommentText2:  (~[">"])+ >
> | < CommentEnd2:   ">" > : DEFAULT
> }
> 
>  
> 
> WithinComment3> TOKEN :
> {
>   < CommentText3:  ?? >
> | < CommentEnd3:   ??> : DEFAULT
> }
> 
> 
> 
> -
> Do You Yahoo!?
> HotJobs, a Yahoo! service - Search Thousands of New
> Jobs
> 
> 
> 
> __
> Do You Yahoo!?
> HotJobs - Search Thousands of New Jobs
> http://www.hotjobs.com
> 


__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: summary text for indexed jsp files -- modify the HTMLParser.jj

2002-08-14 Thread Jon Wasson

Karen - 
There is probably a simpler solution... why parse the
local .jsp file at all?  Grab it with a webcrawler and
write it locally to your disk in a temp directory. 
Then parse the file with with its content as though it
were a complete HTML file.  This way instead of
dealing with files by extension
(.jsp,.html,.asp,.etc), you can deal with them by mime
type, thus lumping them all together.  This can take
care of messy issues like how to get at Lotus Notes
documents (without going through the client) or how to
parse an .asp file vs a .jsp file.  And you can use
the standard indexHTML file as it currently is. Just a
suggestion.

Jon





karen bran <[EMAIL PROTECTED]>
08/12/2002 04:28 PM
Please respond to "Lucene Users List"
 
 To: [EMAIL PROTECTED]
 cc: 
 Subject: summary text for indexed jsp files --
modify the HTMLParser.jj



Hello,

I modified the IndexHTML.java and let the jsp files be
indexed, but the source code of the jsp tags such as
<%@page import... shows up in the result summary.

I checked this mailing list messages, someone
suggested to modify the HTMLParser.jj file to make the
jsp tag text as the 3rd comment. Since I am not
familiar with the Javacc grammar, I don't know how to
hack the HTMLParser.jj and insert in the 3rd comment
tag for the jsp tag.

here is the 2 existing comment tags in the
HTMLParser.jj,  can someone help me to figure out how
to add the 3rd one ??? 

Thanks a lot.

 

 TOKEN :
{
  < CommentText1:  (~["-"])+ | "-" >
| < CommentEnd1:   "-->" > : DEFAULT
}

 TOKEN :
{
  < CommentText2:  (~[">"])+ >
| < CommentEnd2:   ">" > : DEFAULT
}

 

WithinComment3> TOKEN :
{
  < CommentText3:  ?? >
| < CommentEnd3:   ??> : DEFAULT
}



-
Do You Yahoo!?
HotJobs, a Yahoo! service - Search Thousands of New
Jobs



__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




summary text for indexed jsp files -- modify the HTMLParser.jj

2002-08-12 Thread karen bran


Hello,

I modified the IndexHTML.java and let the jsp files be indexed, but the source code of 
the jsp tags such as <%@page import... shows up in the result summary.

I checked this mailing list messages, someone suggested to modify the HTMLParser.jj 
file to make the jsp tag text as the 3rd comment. Since I am not familiar with the 
Javacc grammar, I don't know how to hack the HTMLParser.jj and insert in the 3rd 
comment tag for the jsp tag.

here is the 2 existing comment tags in the HTMLParser.jj,  can someone help me to 
figure out how to add the 3rd one ??? 

Thanks a lot.

 

 TOKEN :
{
  < CommentText1:  (~["-"])+ | "-" >
| < CommentEnd1:   "-->" > : DEFAULT
}

 TOKEN :
{
  < CommentText2:  (~[">"])+ >
| < CommentEnd2:   ">" > : DEFAULT
}

 

WithinComment3> TOKEN :
{
  < CommentText3:  ?? >
| < CommentEnd3:   ??> : DEFAULT
}



-
Do You Yahoo!?
HotJobs, a Yahoo! service - Search Thousands of New Jobs