Re: [request] modperl mailing lists searchable archives wanted

2001-10-16 Thread Stas Bekman

Joshua Chamas wrote:

> Stas Bekman wrote:
> 
>>dev@@perl.apache.org - 2.5, but their search engines suck
>>[EMAIL PROTECTED] - none
>>[EMAIL PROTECTED] - none
>>[EMAIL PROTECTED]  - none
>>[EMAIL PROTECTED]   - 1
>>
>>
> 
> Hey Stas, 
> 
> I have the asp list getting archived at:
> 
>   http://www.mail-archive.com/asp%40perl.apache.org/


Added. Thanks Joshua

 
> Thanks for keeping up on this.  It would be nice to 
> have another search archive for the asp list too.

:)

_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: [request] modperl mailing lists searchable archives wanted

2001-10-10 Thread Joshua Chamas

Stas Bekman wrote:
> 
> dev@@perl.apache.org - 2.5, but their search engines suck
> [EMAIL PROTECTED] - none
> [EMAIL PROTECTED] - none
> [EMAIL PROTECTED]  - none
> [EMAIL PROTECTED]   - 1
> 

Hey Stas, 

I have the asp list getting archived at:

  http://www.mail-archive.com/asp%40perl.apache.org/

Thanks for keeping up on this.  It would be nice to 
have another search archive for the asp list too.

Josh

_
Joshua Chamas   Chamas Enterprises Inc.
NodeWorks Founder   Huntington Beach, CA  USA 
http://www.nodeworks.com1-714-625-4051



Re: [request] modperl mailing lists searchable archives wanted

2001-10-10 Thread Stas Bekman

Bill Moseley wrote:

> Hi Stas,
> 
> I just updated the search site for Apache.org with a newer version of
> swish.  The context highlighting is a bit silly, but that can be fixed.
> I'm only caching the first 15K of text from each page for context
> highlighting.
> 
> http://search.apache.org
> 
> It seems reasonably fast (it's not running under mod_perl currently, but
> could -- if mod_perl was in that server ;).
> 
> It takes about eight or nine minutes to reindex ~35,000 docs on *.apache.org
> so the mod_perl list (and others) shouldn't too much trouble, I'd think,
> with smaller numbers and smaller content.
> 
> It doesn't do incremental indexing at this point, which is a draw back, but
> indexing is so fast it normally doesn't matter (and there's an easy
> work-around for something like a mailing list to pickup new messages as
> they come in during the day).
> 
> Swish-e can also call a perl program which feeds docs to swish.  That makes
> it easy to parse the email into fields for something like:
> 
>   http://swish-e.org/Discussion/search/swish.cgi
> 
> which looks a lot like the Apache search site...
> 
> But, what would be needed is a good threaded mail archiver, which there are
> many to pick from, I'd expect.
> 
> 
>>Some 
>>archives are browsable, but their search engines simply suck. e.g. 
>>marc.theaimsgroup.com I think is the only one that archives 
>>[EMAIL PROTECTED], but if you try to seach for perl string like 
>>APR::Table::FETCH it won't find anything. If you search for
>>get_dir_config it will split it into 'get', 'dir', 'config' and give you 
>>a zillion matches when you know that there are just a few.
>>
> 
> On swish you could say ":" and "_" are part of words and those would index
> as full words.  Or, just simply search for phrase: "get_dir_config" and it
> would search for the phrase "get dir config" which would probably find what
> you want.
> 
> Maybe : and _ are ok in words, but you have to think carefully about
> others.  It's more flexible to split the words and use phrases in many cases.

Hi Bill,

It's great that search.apache.org gets a new engine, but if you run a 
few simple tests it's still not very good with what you've just 
explained. When I search for mod_perl, I search for 'mod_perl' and not 
'mod' and 'perl'. It's possible that there are hundreds of pages which 
have mod_perl or 'mod' and 'perl' in them, in the current case those 
with 'mod_perl' won't get higher relevance than those with 'mod' and 
'perl'. So it's not good.

Well we have been through this already with Randy Kobe's version of the 
searchable guide (Swish-E too), which has been tuned to work with Perl 
content. You may want to ask Randy to give you the tuned configuration. 
You can compare all three search engines used at 
http://perl.apache.org/guide/#search.

Thanks Bill!

_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: [request] modperl mailing lists searchable archives wanted

2001-10-09 Thread Stas Bekman

Elizabeth Mattijsen wrote:

> At 05:59 PM 10/9/01 +0800, Stas Bekman wrote:
> 
>> Please try to send links only for good archives with good search engines.
>> Thanks a bunch!
> 
> 
> Still in beta phase, and only containing Perl newsgroups, it nonetheless 
> might be interesting to check out:
> 
>   
> http://news.search.nl/style/search.en/read/category/Programming_Languages 
> http://news.search.nl/style/search.en/read/category/Programming_Languages/Pe 
> rl/list/page1.html
> 
> Currently refreshed 4 times a day, with searching being refreshed once a 
> day.
> 
> The site actually runs ModPerl with Matt Sergeant's LibXML and LibXSLT 
> modules.


That's cool, but I've asked for the links with modperl-foo lists 
archives that I've listed in my original post (we have enough archives 
of the modperl list itself).

Thanks, Elizabeth




-- 


_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: [request] modperl mailing lists searchable archives wanted

2001-10-09 Thread Stas Bekman

Geoffrey Young wrote:

>>I've just updated the archives list at 
>>http://perl.apache.org/#maillists, so here is what we have:
>>
>>dev@@perl.apache.org - 2.5, but their search engines suck
>>[EMAIL PROTECTED] - none
>>[EMAIL PROTECTED] - none
>>[EMAIL PROTECTED]  - none
>>[EMAIL PROTECTED]   - 1
>>
> 
> as far as I know, nobody is archiving [EMAIL PROTECTED] either,
> which is also of interest to us mod_perl folks :)

At least: http://www.apachelabs.org/test-dev/

I'll add this link.

_
Stas Bekman JAm_pH  --   Just Another mod_perl Hacker
http://stason.org/  mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/




Re: [request] modperl mailing lists searchable archives wanted

2001-10-09 Thread Elizabeth Mattijsen

At 05:59 PM 10/9/01 +0800, Stas Bekman wrote:
>Please try to send links only for good archives with good search engines.
>Thanks a bunch!

Still in beta phase, and only containing Perl newsgroups, it nonetheless 
might be interesting to check out:

   http://news.search.nl/style/search.en/read/category/Programming_Languages 
http://news.search.nl/style/search.en/read/category/Programming_Languages/Pe 
rl/list/page1.html

Currently refreshed 4 times a day, with searching being refreshed once a day.

The site actually runs ModPerl with Matt Sergeant's LibXML and LibXSLT modules.




Elizabeth Mattijsen

Note: I am the main developer of this website, so I am prejudiced  ;-)




RE: [request] modperl mailing lists searchable archives wanted

2001-10-09 Thread Geoffrey Young


> 
> I've just updated the archives list at 
> http://perl.apache.org/#maillists, so here is what we have:
> 
> dev@@perl.apache.org - 2.5, but their search engines suck
> [EMAIL PROTECTED] - none
> [EMAIL PROTECTED] - none
> [EMAIL PROTECTED]  - none
> [EMAIL PROTECTED]   - 1

as far as I know, nobody is archiving [EMAIL PROTECTED] either,
which is also of interest to us mod_perl folks :)

--Geoff



Re: [request] modperl mailing lists searchable archives wanted

2001-10-09 Thread Bill Moseley

Hi Stas,

I just updated the search site for Apache.org with a newer version of
swish.  The context highlighting is a bit silly, but that can be fixed.
I'm only caching the first 15K of text from each page for context
highlighting.

http://search.apache.org

It seems reasonably fast (it's not running under mod_perl currently, but
could -- if mod_perl was in that server ;).

It takes about eight or nine minutes to reindex ~35,000 docs on *.apache.org
so the mod_perl list (and others) shouldn't too much trouble, I'd think,
with smaller numbers and smaller content.

It doesn't do incremental indexing at this point, which is a draw back, but
indexing is so fast it normally doesn't matter (and there's an easy
work-around for something like a mailing list to pickup new messages as
they come in during the day).

Swish-e can also call a perl program which feeds docs to swish.  That makes
it easy to parse the email into fields for something like:

  http://swish-e.org/Discussion/search/swish.cgi

which looks a lot like the Apache search site...

But, what would be needed is a good threaded mail archiver, which there are
many to pick from, I'd expect.

>Some 
>archives are browsable, but their search engines simply suck. e.g. 
>marc.theaimsgroup.com I think is the only one that archives 
>[EMAIL PROTECTED], but if you try to seach for perl string like 
>APR::Table::FETCH it won't find anything. If you search for
>get_dir_config it will split it into 'get', 'dir', 'config' and give you 
>a zillion matches when you know that there are just a few.

On swish you could say ":" and "_" are part of words and those would index
as full words.  Or, just simply search for phrase: "get_dir_config" and it
would search for the phrase "get dir config" which would probably find what
you want.

Maybe : and _ are ok in words, but you have to think carefully about
others.  It's more flexible to split the words and use phrases in many cases.



Bill Moseley
mailto:[EMAIL PROTECTED]