Re: [preview] Search engine for the Guide

2000-05-19 Thread Stas Bekman

On Fri, 19 May 2000, Randy Kobes wrote:

> On Fri, 19 May 2000, Stas Bekman wrote:
> 
> > On Fri, 19 May 2000, raptor wrote:
> > 
> > > hi,
> > > 
> > > very interesting. Search for : "statinc" returns nothing and the box get filled
> > > with "tatinc" instead "statinc" ?!?!:")
> > > 
> > > this under KDE viewer, now will try netscape   ...!!
> > 
> > it's not the client -- it's a bug.
> > 
> > This happened after Randy has made non-stemming as a default. When you
> > turn the stemming on you get it right. Randy, ideas?
> 
> Hi,
> This was a bug, which was just fixed - 'statinc' now returns
> reasonable results. Also, I fixed it so a search term of
> $SIG{__DIE__}, for example, also returns some results.

Almost, when you search for it for the first time, it's Ok. But then you
append \ before $SIG{__DIE__} and it searchs for \\$SIG{__DIE__} which
yields nothing.

'VINC' gives nothing as well :(

Looks like a try and catch game...

_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org




Re: [preview] Search engine for the Guide

2000-05-19 Thread Randy Kobes

On Fri, 19 May 2000, Stas Bekman wrote:

> On Fri, 19 May 2000, raptor wrote:
> 
> > hi,
> > 
> > very interesting. Search for : "statinc" returns nothing and the box get filled
> > with "tatinc" instead "statinc" ?!?!:")
> > 
> > this under KDE viewer, now will try netscape   ...!!
> 
> it's not the client -- it's a bug.
> 
> This happened after Randy has made non-stemming as a default. When you
> turn the stemming on you get it right. Randy, ideas?

Hi,
This was a bug, which was just fixed - 'statinc' now returns
reasonable results. Also, I fixed it so a search term of
$SIG{__DIE__}, for example, also returns some results.

best regards,
randy





Re: [preview] Search engine for the Guide

2000-05-19 Thread Stas Bekman

On Fri, 19 May 2000, raptor wrote:

> hi,
> 
> very interesting. Search for : "statinc" returns nothing and the box get filled
> with "tatinc" instead "statinc" ?!?!:")
> 
> this under KDE viewer, now will try netscape   ...!!

it's not the client -- it's a bug.

This happened after Randy has made non-stemming as a default. When you
turn the stemming on you get it right. Randy, ideas?


_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org




Re: [preview] Search engine for the Guide

2000-05-19 Thread raptor

hi,

very interesting. Search for : "statinc" returns nothing and the box get filled
with "tatinc" instead "statinc" ?!?!:")

this under KDE viewer, now will try netscape   ...!!



RE: [preview] Search engine for the Guide

2000-05-19 Thread Stas Bekman

> > That would be nice to see. I'm afraid I'll continue on working on guide.
> > So if there anyone with a few free minutes on his hands, he/she might like
> > to contribute something back to community ;)
> >
> > Ideally, when we complete the tuning of the search engine, we will be able
> > to have the whole site, apache::asp and embperl pages searchable as well.
> > (with Perl style documentation in mind).
> >
> 
> Stas,
> 
> there is already a search frontend for the apache sites, at
> http://www.apache.org/search.html which is also able to search under
> perl.apache.org, but if you enter mod_perl, doesn't find anything :-(. Don't
> know if this is of any use and who is maintaining (or not maintaining) this
> page.

Heh, look at the bottom of the http://perl.apache.org/guide/index.html --
the search box from http://www.apache.org/search.html is there since the
day the guide is online. But as you said -- it's useless, as it's not good
for the kind of documentation we have. 

I've posted a request for comments about the apache.org search engine to
the asf members list but it was ignored :(

_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org




RE: [preview] Search engine for the Guide

2000-05-19 Thread Gerald Richter

>
> That would be nice to see. I'm afraid I'll continue on working on guide.
> So if there anyone with a few free minutes on his hands, he/she might like
> to contribute something back to community ;)
>
> Ideally, when we complete the tuning of the search engine, we will be able
> to have the whole site, apache::asp and embperl pages searchable as well.
> (with Perl style documentation in mind).
>

Stas,

there is already a search frontend for the apache sites, at
http://www.apache.org/search.html which is also able to search under
perl.apache.org, but if you enter mod_perl, doesn't find anything :-(. Don't
know if this is of any use and who is maintaining (or not maintaining) this
page.

Gerald




Re: [preview] Search engine for the Guide

2000-05-19 Thread Stas Bekman

On Fri, 19 May 2000, Matt Sergeant wrote:

> On Thu, 18 May 2000, Randy Kobes wrote:
> 
> > Another thing that was configured in is that words have
> > to be at least 3 characters long, which seems reasonable,
> > and also there's some stopwords that don't get indexed,
> > as they're too common. This list of stopwords is built
> > by hand - so far it only includes 'perl' and 'modperl'.
> > Also, the maximum number of hits is set at 30.
> 
> It should also index $/, etc. So limiting to >2char words is another
> broken aspect...

Seems like for Perl documentation there should be no limiting at all, or
may be one character is the only option...

> But I'm not complaining! It's 100% better than it was. Maybe someone
> would like my code for a db backed search engine and fix that up to
> something that could work? It's all built in perl so you're free to add
> and remove stopwords or change the min word length as you like. 

That would be nice to see. I'm afraid I'll continue on working on guide. 
So if there anyone with a few free minutes on his hands, he/she might like
to contribute something back to community ;) 

Ideally, when we complete the tuning of the search engine, we will be able
to have the whole site, apache::asp and embperl pages searchable as well.
(with Perl style documentation in mind).

_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org




Re: [preview] Search engine for the Guide

2000-05-19 Thread Stas Bekman

On Fri, 19 May 2000, Ged Haywood wrote:

> Hi all,
> 
> On Thu, 18 May 2000, Randy Kobes wrote:
> 
> > > The :: are stripped on the fly, since these cannot be used in index, so
> > > when you look for Foo::Bar you are actually looking for 'Foo && Bar'.
> > 
> > That's a limitation of swish-e - you can configure it to
> > index characters like $, !, ... as part of a "word", but
> > the characters >, <, *, and : cannot be so indexed.
> 
> If you use swish++4.4 then you can change this in "config.h"
> 
> // Characters that are permissible in words: letters must be lower
> // case and upper case letters would be redundant.
> //
> char const Word_Chars[] = "&'-0123456789abcdefghijklmnopqrstuvwxyz_";
> // Characters that may be in a word.  Note that '&' is here so
> // acronyms like "AT&T" are treated as one word.  Unlike SWISH-E,
> // ';' does not need to be here to recognize and convert character
> // entity references.

Interesting, Randy what version did you use?

Thanks Ged!

_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org




Re: [preview] Search engine for the Guide

2000-05-19 Thread Stas Bekman

On Thu, 18 May 2000, Randy Kobes wrote:

> On Fri, 19 May 2000, Stas Bekman wrote:
> 
> > On Thu, 18 May 2000, Vivek Khera wrote:
> > 
> > > looks good... one minor issue with the stickyness of the next search
> > > feature:
> > > 
> > > type "lexical file handles" in your original search.  the "es" at the
> > > end is lost in the next search box on the result page.
> > 
> > Yup, broken :(
> 
> Hi,
> But fixable ...:) As I just mentioned, we can turn stemming
> off, or at least make it optional, so the full word only is
> searched for. I've found stemming useful, but that's perhaps
> just the way I do searches - should I turn it off by default to see if
> that's preferable? And make it then a configurable option?

Yup, turn it off. And have an option to turn it on. 

Thanks!

_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org




Re: [preview] Search engine for the Guide

2000-05-19 Thread Matt Sergeant

On Thu, 18 May 2000, Randy Kobes wrote:

> Another thing that was configured in is that words have
> to be at least 3 characters long, which seems reasonable,
> and also there's some stopwords that don't get indexed,
> as they're too common. This list of stopwords is built
> by hand - so far it only includes 'perl' and 'modperl'.
> Also, the maximum number of hits is set at 30.

It should also index $/, etc. So limiting to >2char words is another
broken aspect...

But I'm not complaining! It's 100% better than it was. Maybe someone would
like my code for a db backed search engine and fix that up to something
that could work? It's all built in perl so you're free to add and remove
stopwords or change the min word length as you like.

-- 


Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org http://xml.sergeant.org




Re: [preview] Search engine for the Guide

2000-05-19 Thread Ged Haywood

Hi all,

On Thu, 18 May 2000, Randy Kobes wrote:

> > The :: are stripped on the fly, since these cannot be used in index, so
> > when you look for Foo::Bar you are actually looking for 'Foo && Bar'.
> 
> That's a limitation of swish-e - you can configure it to
> index characters like $, !, ... as part of a "word", but
> the characters >, <, *, and : cannot be so indexed.

If you use swish++4.4 then you can change this in "config.h"

// Characters that are permissible in words: letters must be lower
// case and upper case letters would be redundant.
//
char const Word_Chars[] = "&'-0123456789abcdefghijklmnopqrstuvwxyz_";
// Characters that may be in a word.  Note that '&' is here so
// acronyms like "AT&T" are treated as one word.  Unlike SWISH-E,
// ';' does not need to be here to recognize and convert character
// entity references.

73,
Ged.




Re: [preview] Search engine for the Guide

2000-05-18 Thread Randy Kobes

On Thu, 18 May 2000, Jeremy Howard wrote:

> Stas Bekman <[EMAIL PROTECTED]> wrote:
> > Ok, We have a preview ready for you. Randy Kobes worked hard to prepare
> > this one. So your comments are very welcome. If you like it we'll put this
> > into production. 
> > 
> > Please keep either the list CC'ed or if you reply to me in person, make
> > sure you keep Randy CC'ed -- all the kudos should go his way :)
> > 
> When I search for 'dbi' or 'DBI', it finds nothing, and the search box shows 'dby'!
> 
> It looks like it's try to helpfully change my search pattern...
> 

Hi,
   I turned stemming on by default - that's why the search
pattern gets changed. This obviously causes confusion - I'll
turn it off, and make it manually configurable. As well, the
indexing was configured for words greater than 3 characters;
I'll reduce it down to greater than 2 characters and see if
that helps.

best regards,
randy





Re: [preview] Search engine for the Guide

2000-05-18 Thread Jeremy Howard

Stas Bekman <[EMAIL PROTECTED]> wrote:
> Ok, We have a preview ready for you. Randy Kobes worked hard to prepare
> this one. So your comments are very welcome. If you like it we'll put this
> into production. 
> 
> Please keep either the list CC'ed or if you reply to me in person, make
> sure you keep Randy CC'ed -- all the kudos should go his way :)
> 
When I search for 'dbi' or 'DBI', it finds nothing, and the search box shows 'dby'!

It looks like it's try to helpfully change my search pattern...


-- 
  Jeremy Howard
  [EMAIL PROTECTED]



Re: [preview] Search engine for the Guide

2000-05-18 Thread Randy Kobes

On Fri, 19 May 2000, Stas Bekman wrote:

> On Thu, 18 May 2000, Matt Sergeant wrote:
> 
> > One more point... The indexer or the searcher (or both) has a broken
> > tokenizer for anything involving perl. Try searching for
> > Apache::Constants, for example.
> 
> That's right. It's broken :( After searching for 'Apache::Constants' I've
> got 'apach constant'... 

Just to expand on this - I turned stemming of words on by default
in the search, which is why the stemmed words get returned. Perhaps
it'll be better to turn stemming off by default, and rather
make it a configureable option?

> The :: are stripped on the fly, since these cannot be used in index, so
> when you look for Foo::Bar you are actually looking for 'Foo && Bar'.

That's a limitation of swish-e - you can configure it to
index characters like $, !, ... as part of a "word", but
the characters >, <, *, and : cannot be so indexed. So the
script silently stripped ':' out, leaving the search term
to be 'Apache' && 'Constants'. This should be mentioned 
on the search page  

Another thing that was configured in is that words have
to be at least 3 characters long, which seems reasonable,
and also there's some stopwords that don't get indexed,
as they're too common. This list of stopwords is built
by hand - so far it only includes 'perl' and 'modperl'.
Also, the maximum number of hits is set at 30.

best regards,
randy




Re: [preview] Search engine for the Guide

2000-05-18 Thread Stas Bekman

On Thu, 18 May 2000, Matt Sergeant wrote:

> Looks cool, except can we take the guide splitting back 1 level? It
> seems to be split on =head2's, and should be split (IMO) on =head1's. 

The reason for splitting on any =head level lies in fact that there are
huge sections under =head1 which have many =head{2,5}, and I'm slowly
reworking the Guide to making it more categorized (nested), rather than
flattened as it was before (and still is).

But we have thought about this issue. Look at the links at the bottom of
the splitted page -- it can take to the full version as well.

> One more point... The indexer or the searcher (or both) has a broken
> tokenizer for anything involving perl. Try searching for
> Apache::Constants, for example.

That's right. It's broken :( After searching for 'Apache::Constants' I've
got 'apach constant'... 

The :: are stripped on the fly, since these cannot be used in index, so
when you look for Foo::Bar you are actually looking for 'Foo && Bar'.


_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org




Re: [preview] Search engine for the Guide

2000-05-18 Thread Matt Sergeant

On Fri, 19 May 2000, Stas Bekman wrote:

> Ok, We have a preview ready for you. Randy Kobes worked hard to prepare
> this one. So your comments are very welcome. If you like it we'll put this
> into production. 
> 
> Please keep either the list CC'ed or if you reply to me in person, make
> sure you keep Randy CC'ed -- all the kudos should go his way :)
> 
> So:
> 
> The search is at:
> 
> http://theoryx5.uwinnipeg.ca/cgi-bin/guide-search
> 
> and the split guide is at:
> 
> http://theoryx5.uwinnipeg.ca/guide/

One more point... The indexer or the searcher (or both) has a broken
tokenizer for anything involving perl. Try searching for
Apache::Constants, for example.

-- 


Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org http://xml.sergeant.org




Re: [preview] Search engine for the Guide

2000-05-18 Thread Matt Sergeant

On Fri, 19 May 2000, Stas Bekman wrote:

> Ok, We have a preview ready for you. Randy Kobes worked hard to prepare
> this one. So your comments are very welcome. If you like it we'll put this
> into production. 
> 
> Please keep either the list CC'ed or if you reply to me in person, make
> sure you keep Randy CC'ed -- all the kudos should go his way :)
> 
> So:
> 
> The search is at:
> 
> http://theoryx5.uwinnipeg.ca/cgi-bin/guide-search
> 
> and the split guide is at:
> 
> http://theoryx5.uwinnipeg.ca/guide/

Looks cool, except can we take the guide splitting back 1 level? It seems
to be split on =head2's, and should be split (IMO) on =head1's.

-- 


Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org http://xml.sergeant.org




[preview] Search engine for the Guide

2000-05-18 Thread Stas Bekman

Ok, We have a preview ready for you. Randy Kobes worked hard to prepare
this one. So your comments are very welcome. If you like it we'll put this
into production. 

Please keep either the list CC'ed or if you reply to me in person, make
sure you keep Randy CC'ed -- all the kudos should go his way :)

So:

The search is at:

http://theoryx5.uwinnipeg.ca/cgi-bin/guide-search

and the split guide is at:

http://theoryx5.uwinnipeg.ca/guide/

Enjoy!


_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://perl.org http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org