Jeremy Howard wrote:
>
> I'm glad you brought this up again. Since I mentioned I'd be happy to
> host such a thing, and asked for suggestions, I've got
> a total of one (from Stas--thanks!). That suggestion was to use
> ht://dig <http://www.htdig.org/>.
>
> Has anyone got a search engine up and running that they're happy with?
> Stas has made the good point that it needs to be able
> to hilight found words, since the pages are quite large. If anyone has
> a chance to do a bit of research about (free) search
> engines, I'd really appreciate it if you could let me know what you
> find out. It'd be nice publicity if it was mod_perl based,
> I guess, but it doesn't really matter.

I'm happy with ht://dig, I use it mainly for looking up docs I've
squirreled away in /manual. (instead of grep)

It's been a while since I've been to htdig.org but I did grab a tarball
recently, so I'm fairly confident there isn't* an existing mod_perl
wrapper -- but maybe there should be.

There are a number of perl scripts in the distribution, and I thought*
there was a plain Perl wrapper, but I could be mistaken.

I think a mod_perl frontend/wrapper could work well, that is, htsearch
is about 900K+ and takes a moment to fire up (on my box anyway) -- how
much worse could it be?

OTOH, one could* (conceivably) get crazy and access the DB's directly
and maybe XS any needed portions of htsearch (ambitious :-). However,
this still leaves htdig, htfuzzy, htmerge, etc .. to handle the
indexing.

As far as highlighting, I have a piece of code I'm using -- we could use
it as a starting point. Downside is it uses $` $' (it can probably be
tweeked to avoid this), but it handles the critical stuff like skipping
keywords within href's/tags, etc.

RE: Matt Sergeant -- Perhaps highlighting is overrated, but it usually
doesn't hurt. I too have a proprietary search facility, and a inverted
indexing prototype (stores packed doc-id integers in MySQL, for example)
-- but a great deal of work has gone into ht://dig ......


> My only concern is that it seems a little odd to keep this just to the
> Guide. Wouldn't it be useful for the rest of
> perl.apache.org? I wouldn't have thought it's much extra work to add a
> drop-down box to search specific areas of the sight
> (the Guide being one)...

I'd have to agree there.


> If there's a good reason to have the Guide's search engine separate to
> the rest of perl.apache.org, should it have a
> separate domain (modperlguide.org?, guide.perl.apache.org?)?
> 
> --
> Jeremy Howard
> [EMAIL PROTECTED]

ht://dig allows for the param 'restrict' => /to_this_directory .. which
might be useful for seperating things.

Count me in, whatever we choose.

-Jay J

# use Text::Wrapper;

Reply via email to