well, there's definitely a need for a poor man's Search engine.
But I think you need to keep 2 things separate
1) the search engine
2) the spider (to insert into the engine's db). Well, you don't need to call
it a spider, it's more an interface to gather info. Spidering is only one of
the possibilities.

I think 1 can be pretty general. It has an obvious interface and there might
be a few different implementations (one based on a db, another on a file
e.g.)

but the spider will be very depending on the project. In some projects, all
info is in the db. In that case, there's no need to crawl through the pages
at all.

> Search Engine are now a web site / app fundamental. If you don't have
> one or dont bother offering "search this site" you look sad. If you
> have one already and it does not work properly or very limited
> and/or tries to do too much
> ( "http://www.javascript.com/"; or "http://www.docjs.com/"; )
>
> I just looking Java Networking by Elliot Rusty  Orielly 2nd Edition
> where there is an example of using Swing HTML Document / Parser package
> `javax.swing.html.*' to trawl through a web page and print the hyperlink.
> That gave me a brillaint idea of implementing a poor mans search engine.
> You could take this example as basis for a Poor man JSP search engine.
> But may be I dont wanna invent the wheel for Nth time in this new century.
>
> No if you forget about indexing words for now. I can see two methods
> of doing this for JSP pages.
>
> 1) Look at File System and grab "*.jsp" using FileReader or something.
> Effectively you are looking at raw JSP which have a mixture of Java
> and HTML.
>
> Pros:
> you dont have to deal with security.
> It goings to be fast on the same server
>
> Cons:
> Potentially mixtures of Java and HTML
> Restricted to the web server.

And you might miss the most crucial information. I really think this is a
bad idea.


>
> 2) Grab the JSP pages using a java.net.URL object effectively you
> are surfing your web app.
>
> Pros:
>
> Most portable solution
>
> Cons:
>
> Big problem all web app have some form security built in. Usually it
> either custom form based or (intranet wise) web app realm based.
> How do U get your search engine to authenticate itself through its
> own web app? This is important for ecommerce site were you want
> to able list pages that are deemed protected resources.
>

Not only that, but there's also the problem that many pages need parameters
(either POSTed or GET) and the combination of all JSP pages with the
parameters might make your Search engine DB explode.

I like the idea but I really think it's not all that obvious. Especially the
spider stuff.

Geert Van Damme

===========================================================================
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.html
 http://www.jguru.com/jguru/faq/faqpage.jsp?name=JSP
 http://www.jguru.com/jguru/faq/faqpage.jsp?name=Servlets

Reply via email to