Re: search engine module?

2001-10-18 Thread Kee Hinckley

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

People have been talking about backend search engines, but when I saw 
the subject I was thinking more about front end classes.  In 
particular, last time I looked there wasn't a standard class for 
integrating local search engines into your code.  I ended up making a 
WWW::Search, but you kind of have to tweak the meaning of some 
values.  If anyone is interested I ought to release it.  It's a 
trivial example for very small web sites (it provides google-like 
search syntax, and backends it with grep).
- -- 

Kee Hinckley - Somewhere.Com, LLC
http://consulting.somewhere.com/
[EMAIL PROTECTED] (or ...!alice!nazgul for time travelers :-)

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

-BEGIN PGP SIGNATURE-
Version: PGP Personal Security 7.0.3

iQA/AwUBO88CGCZsPfdw+r2CEQLj9ACfSqjkFgwvFR0iXWRRS9B2oM6EcZ8AoNSd
6jkha/LM8cS1ia4mYti8tiGW
=yXL9
-END PGP SIGNATURE-



Re: search engine module?

2001-10-18 Thread Stas Bekman

Kee Hinckley wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 People have been talking about backend search engines, but when I saw 
 the subject I was thinking more about front end classes.  In 
 particular, last time I looked there wasn't a standard class for 
 integrating local search engines into your code.  I ended up making a 
 WWW::Search, but you kind of have to tweak the meaning of some 
 values.  If anyone is interested I ought to release it.  It's a 
 trivial example for very small web sites (it provides google-like 
 search syntax, and backends it with grep).


You should have checked CPAN first: There is a load of WWW::Search:: 
modules there.





_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: search engine module?

2001-10-18 Thread Kee Hinckley

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

At 12:56 AM +0800 10/19/01, Stas Bekman wrote:
Kee Hinckley wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

People have been talking about backend search engines, but when I 
saw the subject I was thinking more about front end classes.  In 
particular, last time I looked there wasn't a standard class for 
integrating local search engines into your code.  I ended up making 
a WWW::Search, but you kind of have to tweak the meaning of some 
values.  If anyone is interested I ought to release it.  It's a 
trivial example for very small web sites (it provides google-like 
search syntax, and backends it with grep).


You should have checked CPAN first: There is a load of WWW::Search:: 
modules there.

Yes.  But my point is that they are all *offsite* searches as far as 
I can tell.  What I wanted was a standard interface to a local search 
engine.
- -- 

Kee Hinckley - Somewhere.Com, LLC
http://consulting.somewhere.com/
[EMAIL PROTECTED] (or ...!alice!nazgul for time travelers :-)

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

-BEGIN PGP SIGNATURE-
Version: PGP Personal Security 7.0.3

iQA/AwUBO88W3yZsPfdw+r2CEQLQ8wCgrokvPCmktlUCSLPulsZsVwrBMdwAoMMQ
V1vsViU2nutZioKmgwVnqV22
=03cp
-END PGP SIGNATURE-



Re: search engine module?

2001-10-18 Thread Stas Bekman

Kee Hinckley wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 At 12:56 AM +0800 10/19/01, Stas Bekman wrote:
 
Kee Hinckley wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

People have been talking about backend search engines, but when I 
saw the subject I was thinking more about front end classes.  In 
particular, last time I looked there wasn't a standard class for 
integrating local search engines into your code.  I ended up making 
a WWW::Search, but you kind of have to tweak the meaning of some 
values.  If anyone is interested I ought to release it.  It's a 
trivial example for very small web sites (it provides google-like 
search syntax, and backends it with grep).


You should have checked CPAN first: There is a load of WWW::Search:: 
modules there.

 
 Yes.  But my point is that they are all *offsite* searches as far as 
 I can tell.  What I wanted was a standard interface to a local search 
 engine.

Right, my point is that WWW::Search namespace is taken :)

-- 


_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: search engine module?

2001-10-18 Thread Kee Hinckley

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

At 11:36 AM +0800 10/19/01, Stas Bekman wrote:
Right, my point is that WWW::Search namespace is taken :)

Ah.  Sorry, my miscommunication.  When I said that I ended up making 
a WWW::Search I should have put an an instance of in there instead 
of a.  Basically WWW::Search provided a good interface, but 
everything was remote, so I wrote this.  If you stick to the 
conventions provided here, it should be easy to make other variations 
using other local search engines.  I was just surprised that nobody 
seemed to have done it before.

Grep(3)User Contributed Perl DocumentationGrep(3)


NAME
WWW::Search::Grep - class for searching a local web site
using grep

SYNOPSIS
require WWW::Search;
$search = new WWW::Search('Grep');


DESCRIPTION
This is a grep specialization of WWW::Search.

THis class exports no public interface; all interaction
should be done through WWW::Search objects.

OPTIONS
The default query syntax is:  word word OR word
quoted phrase Blank separated words are implicitly
separated by AND.  OR refers only to the word or phrases
directly to either side.  The model is the same as that
used by Google (http://www.google.com/).

search_url
Specifies the directory to search.  All .html and .htm
files in the specified directory and any
subdirectories will be searched.  This is an absolute
pathname and is required.  E.g.
/home/httpd/html/foo/searchdir/

base_path
This is this is the part of that pathname that should
be stripped off before prefixing the base_url.  This
is required.  E.g. /home/httpd/html/

base_url
This is prepended to the pathname after stripping the
base_path.  This is optional, the default is none.
E.g. http://www.somewhere.com/ or /

search_debug,search_parse_debug
See WWW::Search

grep
Pathname to grep, default is /bin/egrep.

AUTHOR
Kee Hinckley, [EMAIL PROTECTED]


- -- 

Kee Hinckley - Somewhere.Com, LLC
http://consulting.somewhere.com/
[EMAIL PROTECTED] (or ...!alice!nazgul for time travelers :-)

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

-BEGIN PGP SIGNATURE-
Version: PGP Personal Security 7.0.3

iQA/AwUBO8+mtSZsPfdw+r2CEQI1+wCeI3s9JcPuXvaexrriahCWnjtTS/kAnjl3
v7uvLYWz4xxxc2weT/qU0f2n
=MXIA
-END PGP SIGNATURE-



Re: search engine module?

2001-10-17 Thread Stas Bekman

Daniel Sully wrote:

 Is the engine used at the math forum publiclicly available?


I don't know. Why don't you ask them :)


 Once upon a time Stas Bekman shaped the electrons to say...
 
 
the engine at mathforum does a great job, it's the best mailing list 
archive search engine that I've ever seen, in regards to searching Perl 
strings and code in general. Just make sure to use the right options at:
http://mathforum.org/discussions/epi-search/modperl.html

  
 -D
 --
 Zim I am the neighbourhood baby inspector. I have come to inspect the baby.
 Mother Oh, goodness! Inspect him for what?
 Zim YOUR RESISTANCE WILL BE NOTED!
 



-- 


_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: search engine module?

2001-10-17 Thread Oleg Bartunov

We use OpenFTS (http://openfts.sourceforge.net) at
postgresql mailing list archive ( http://fts.postgresql.org).

Regards,
Oleg
On Wed, 17 Oct 2001, Stas Bekman wrote:

 Ged Haywood wrote:

  Hi all,
 
  On Mon, 15 Oct 2001, Ask Bjoern Hansen wrote:
 
 
 On Fri, 12 Oct 2001, Perrin Harkins wrote:
 
 [...]
 
 Plus lots of other stuff like Glimpse and Swish which interface to C-based
 engines.
 
 I've had good luck with http://swish-e.org/2.2/
 
 
  Please make sure that it's possible to do a plain ordinary literal
  text string search.  Nothing fancy, no case-folding, no automatic
  removal of puctuation, nothing like that.  Just a literal string.
 
  Last night I tried to find perl -V on all the search engines
  mentioned on the mod_perl home page and they all failed in various
  interesting ways.
 
  If somebody knows what I'm doing wrong, please post.

 the engine at mathforum does a great job, it's the best mailing list
 archive search engine that I've ever seen, in regards to searching Perl
 strings and code in general. Just make sure to use the right options at:
 http://mathforum.org/discussions/epi-search/modperl.html


 _
 Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
 http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
 mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
 http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83




Re: search engine module?

2001-10-17 Thread Daniel Sully

Is the engine used at the math forum publiclicly available?

Once upon a time Stas Bekman shaped the electrons to say...

 the engine at mathforum does a great job, it's the best mailing list 
 archive search engine that I've ever seen, in regards to searching Perl 
 strings and code in general. Just make sure to use the right options at:
 http://mathforum.org/discussions/epi-search/modperl.html
 
-D
--
Zim I am the neighbourhood baby inspector. I have come to inspect the baby.
Mother Oh, goodness! Inspect him for what?
Zim YOUR RESISTANCE WILL BE NOTED!



Re: search engine module?

2001-10-16 Thread Ged Haywood

Hi all,

On Mon, 15 Oct 2001, Ask Bjoern Hansen wrote:

 On Fri, 12 Oct 2001, Perrin Harkins wrote:
 
 [...]
  Plus lots of other stuff like Glimpse and Swish which interface to C-based
  engines.
 
 I've had good luck with http://swish-e.org/2.2/

Please make sure that it's possible to do a plain ordinary literal
text string search.  Nothing fancy, no case-folding, no automatic
removal of puctuation, nothing like that.  Just a literal string.

Last night I tried to find perl -V on all the search engines
mentioned on the mod_perl home page and they all failed in various
interesting ways.

If somebody knows what I'm doing wrong, please post.

73,
Ged.




RE: search engine module?

2001-10-16 Thread Matt Sergeant

 -Original Message-
 From: Ged Haywood [mailto:[EMAIL PROTECTED]]
 
 Hi all,
 
 On Mon, 15 Oct 2001, Ask Bjoern Hansen wrote:
 
  On Fri, 12 Oct 2001, Perrin Harkins wrote:
  
  [...]
   Plus lots of other stuff like Glimpse and Swish which 
 interface to C-based
   engines.
  
  I've had good luck with http://swish-e.org/2.2/
 
 Please make sure that it's possible to do a plain ordinary literal
 text string search.  Nothing fancy, no case-folding, no automatic
 removal of puctuation, nothing like that.  Just a literal string.
 
 Last night I tried to find perl -V on all the search engines
 mentioned on the mod_perl home page and they all failed in various
 interesting ways.
 
 If somebody knows what I'm doing wrong, please post.

I've written a RDBMS backed search engine that could do such queries, but it
gets expensive after a while, as you have to do table-scans of the full text
of every page in your DBMS to find the match. One thing I could have done
would be to split up the match so it would try and match perl and -V,
before doing the full text search on the subset of results. Never got around
to doing that though. I suspect most search engines (with the exception of
maybe google) are in the same or similar boat.

Matt.

_
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.



Re: search engine module?

2001-10-16 Thread Perrin Harkins

 Please make sure that it's possible to do a plain ordinary literal
 text string search.  Nothing fancy, no case-folding, no automatic
 removal of puctuation, nothing like that.  Just a literal string.

 Last night I tried to find perl -V on all the search engines
 mentioned on the mod_perl home page and they all failed in various
 interesting ways.

The amazingly fast ht://Dig (http://www.htdig.org/) engine can do phrase
searching, but I'm not certain how well it does with punctuation.
- Perrin




Re: [OT] search engine module?

2001-10-16 Thread Bill Moseley

At 02:04 PM 10/16/2001 +0100, Ged Haywood wrote:
  Plus lots of other stuff like Glimpse and Swish which interface to
C-based
  engines.
 
 I've had good luck with http://swish-e.org/2.2/

Please make sure that it's possible to do a plain ordinary literal
text string search.  Nothing fancy, no case-folding, no automatic
removal of puctuation, nothing like that.  Just a literal string.

Last night I tried to find perl -V on all the search engines
mentioned on the mod_perl home page and they all failed in various
interesting ways.

I assume it's how the search engine is configured.  Swish, for example, you
can define what chars make up a word.  Not sure what you mean by literal
string.  For performance reasons you can't just grep words (or parts of
words), so you have to extract out words from the text during indexing.
You might define that a dash is ok at the start of a word, but not at the
end and to ignore trailing dots, so you could find -V and -V. (at the end
of a sentence).

Some search engines let you define a set of buzzwords that should be
indexed as-is, but that's more helpful for technical writing instead of
indexing code.

Finally, in swish, if you put something like perl -V in quotes to use a
phrase search it will find what you are looking for most likely, even if
the dash is not indexed.



Bill Moseley
mailto:[EMAIL PROTECTED]



Re: search engine module?

2001-10-15 Thread Matt J. Avitable


Hi,

 I've written a search engine that searches for jobs in a database based
 on keywords. I'm assembling a string of sql and then submitting it to
 the database based on the user's search criteria. It's working but is

It sounds like you are writing a web front end for mysql.  I'm not
sure about modules on cpan about that specifically.  If you wanted to get
a bit more fancy, you might try DBIx::FullTextSearch.

This module is nice, though mysql specific.  It creates an index 
of your content (event rows in a db), and allows the user to perform
boolean searches on that index.  Word stemming is also available by
installing a seperate module which FullTextSearch uses. 

It's a tad sluggish when the number of rows gets to be above 40,000, but
certainly not unusable.

hth, matt


I don't want to reinvent the wheel and I'm sure this has been done a
zillion times, so does anyone know of a module in CPAN that I can use
for this? I'm using MySQL on the back end and DBI under mod perl which
runs as a handler.


-- 
## Matt J. Avitable ([EMAIL PROTECTED])
## General Partner / Programmer
## Escapement Arts And Media 

## http://www.escapement.net/
## Phone: (804) 400-0605







Re: search engine module? [drifting OT DBI related]

2001-10-15 Thread Mark Maunder

Matt J. Avitable wrote:

 Hi,

  I've written a search engine that searches for jobs in a database based
  on keywords. I'm assembling a string of sql and then submitting it to
  the database based on the user's search criteria. It's working but is

 It sounds like you are writing a web front end for mysql.  I'm not
 sure about modules on cpan about that specifically.  If you wanted to get
 a bit more fancy, you might try DBIx::FullTextSearch.

Thanks. I Checked out FullTextSearch on some earlier advice and it's not
exactly what I'm after, but quite useful none the less. I've started using
MySQL's MATCH/AGAINST with fulltext indexes instead, and it is extremelly
fast (!!), but am waiting for a feature that's available in mysql 4.0 (due
end of this month) that allows you to use +word and -word syntax to specify
required or unwanted keywords. Also just as an asside, match/against only
works with MyISAM tables so I've had to convert some of mine from InnoDB at
the cost of losing transactions.





Re: search engine module? [drifting OT DBI related]

2001-10-15 Thread Mark Maunder

Mark Maunder wrote:

  I've started using
 MySQL's MATCH/AGAINST with fulltext indexes instead, and it is extremelly
 fast (!!), but am waiting for a feature that's available in mysql 4.0 (due
 end of this month) that allows you to use +word and -word syntax to specify
 required or unwanted keywords. Also just as an asside, match/against only
 works with MyISAM tables so I've had to convert some of mine from InnoDB at
 the cost of losing transactions.

er - lo and behold, mysql 4.0 alpha has been released a few minutes ago by
Monty.
http://www.mysql.com/downloads/mysql-4.0.html









Re: search engine module?

2001-10-15 Thread Ask Bjoern Hansen

On Fri, 12 Oct 2001, Perrin Harkins wrote:

[...]
 Plus lots of other stuff like Glimpse and Swish which interface to C-based
 engines.

I've had good luck with http://swish-e.org/2.2/


 - ask

-- 
ask bjoern hansen, http://ask.netcetera.dk/ !try; do();
more than a billion impressions per week, http://valueclick.com




search engine module?

2001-10-12 Thread Mark Maunder

I've written a search engine that searches for jobs in a database based
on keywords. I'm assembling a string of sql and then submitting it to
the database based on the user's search criteria. It's working but is
really simple right now - it just does a logical AND with all the
keywords the user submits. I'd like to include features like the ability
to submit a query like:
(perl AND apache) OR java NOT microsoft

I don't want to reinvent the wheel and I'm sure this has been done a
zillion times, so does anyone know of a module in CPAN that I can use
for this? I'm using MySQL on the back end and DBI under mod perl which
runs as a handler.






Re: search engine module?

2001-10-12 Thread Perrin Harkins

 I don't want to reinvent the wheel and I'm sure this has been done a
 zillion times, so does anyone know of a module in CPAN that I can use
 for this?

Have you tried searching on http://search.cpan.org/?

DBIx::FullTextSearch
DBIxTextIndex
Search::InvertedIndex

Plus lots of other stuff like Glimpse and Swish which interface to C-based
engines.

- Perrin