Re: [request] modperl mailing lists searchable archives wanted
Joshua Chamas wrote: > Stas Bekman wrote: > >>dev@@perl.apache.org - 2.5, but their search engines suck >>[EMAIL PROTECTED] - none >>[EMAIL PROTECTED] - none >>[EMAIL PROTECTED] - none >>[EMAIL PROTECTED] - 1 >> >> > > Hey Stas, > > I have the asp list getting archived at: > > http://www.mail-archive.com/asp%40perl.apache.org/ Added. Thanks Joshua > Thanks for keeping up on this. It would be nice to > have another search archive for the asp list too. :) _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://ticketmaster.com http://apacheweek.com http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
Re: [request] modperl mailing lists searchable archives wanted
Stas Bekman wrote: > > dev@@perl.apache.org - 2.5, but their search engines suck > [EMAIL PROTECTED] - none > [EMAIL PROTECTED] - none > [EMAIL PROTECTED] - none > [EMAIL PROTECTED] - 1 > Hey Stas, I have the asp list getting archived at: http://www.mail-archive.com/asp%40perl.apache.org/ Thanks for keeping up on this. It would be nice to have another search archive for the asp list too. Josh _ Joshua Chamas Chamas Enterprises Inc. NodeWorks Founder Huntington Beach, CA USA http://www.nodeworks.com1-714-625-4051
Re: [request] modperl mailing lists searchable archives wanted
Bill Moseley wrote: > Hi Stas, > > I just updated the search site for Apache.org with a newer version of > swish. The context highlighting is a bit silly, but that can be fixed. > I'm only caching the first 15K of text from each page for context > highlighting. > > http://search.apache.org > > It seems reasonably fast (it's not running under mod_perl currently, but > could -- if mod_perl was in that server ;). > > It takes about eight or nine minutes to reindex ~35,000 docs on *.apache.org > so the mod_perl list (and others) shouldn't too much trouble, I'd think, > with smaller numbers and smaller content. > > It doesn't do incremental indexing at this point, which is a draw back, but > indexing is so fast it normally doesn't matter (and there's an easy > work-around for something like a mailing list to pickup new messages as > they come in during the day). > > Swish-e can also call a perl program which feeds docs to swish. That makes > it easy to parse the email into fields for something like: > > http://swish-e.org/Discussion/search/swish.cgi > > which looks a lot like the Apache search site... > > But, what would be needed is a good threaded mail archiver, which there are > many to pick from, I'd expect. > > >>Some >>archives are browsable, but their search engines simply suck. e.g. >>marc.theaimsgroup.com I think is the only one that archives >>[EMAIL PROTECTED], but if you try to seach for perl string like >>APR::Table::FETCH it won't find anything. If you search for >>get_dir_config it will split it into 'get', 'dir', 'config' and give you >>a zillion matches when you know that there are just a few. >> > > On swish you could say ":" and "_" are part of words and those would index > as full words. Or, just simply search for phrase: "get_dir_config" and it > would search for the phrase "get dir config" which would probably find what > you want. > > Maybe : and _ are ok in words, but you have to think carefully about > others. It's more flexible to split the words and use phrases in many cases. Hi Bill, It's great that search.apache.org gets a new engine, but if you run a few simple tests it's still not very good with what you've just explained. When I search for mod_perl, I search for 'mod_perl' and not 'mod' and 'perl'. It's possible that there are hundreds of pages which have mod_perl or 'mod' and 'perl' in them, in the current case those with 'mod_perl' won't get higher relevance than those with 'mod' and 'perl'. So it's not good. Well we have been through this already with Randy Kobe's version of the searchable guide (Swish-E too), which has been tuned to work with Perl content. You may want to ask Randy to give you the tuned configuration. You can compare all three search engines used at http://perl.apache.org/guide/#search. Thanks Bill! _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://ticketmaster.com http://apacheweek.com http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
Re: [request] modperl mailing lists searchable archives wanted
Elizabeth Mattijsen wrote: > At 05:59 PM 10/9/01 +0800, Stas Bekman wrote: > >> Please try to send links only for good archives with good search engines. >> Thanks a bunch! > > > Still in beta phase, and only containing Perl newsgroups, it nonetheless > might be interesting to check out: > > > http://news.search.nl/style/search.en/read/category/Programming_Languages > http://news.search.nl/style/search.en/read/category/Programming_Languages/Pe > rl/list/page1.html > > Currently refreshed 4 times a day, with searching being refreshed once a > day. > > The site actually runs ModPerl with Matt Sergeant's LibXML and LibXSLT > modules. That's cool, but I've asked for the links with modperl-foo lists archives that I've listed in my original post (we have enough archives of the modperl list itself). Thanks, Elizabeth -- _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://ticketmaster.com http://apacheweek.com http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
Re: [request] modperl mailing lists searchable archives wanted
Geoffrey Young wrote: >>I've just updated the archives list at >>http://perl.apache.org/#maillists, so here is what we have: >> >>dev@@perl.apache.org - 2.5, but their search engines suck >>[EMAIL PROTECTED] - none >>[EMAIL PROTECTED] - none >>[EMAIL PROTECTED] - none >>[EMAIL PROTECTED] - 1 >> > > as far as I know, nobody is archiving [EMAIL PROTECTED] either, > which is also of interest to us mod_perl folks :) At least: http://www.apachelabs.org/test-dev/ I'll add this link. _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://ticketmaster.com http://apacheweek.com http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
Re: [request] modperl mailing lists searchable archives wanted
At 05:59 PM 10/9/01 +0800, Stas Bekman wrote: >Please try to send links only for good archives with good search engines. >Thanks a bunch! Still in beta phase, and only containing Perl newsgroups, it nonetheless might be interesting to check out: http://news.search.nl/style/search.en/read/category/Programming_Languages http://news.search.nl/style/search.en/read/category/Programming_Languages/Pe rl/list/page1.html Currently refreshed 4 times a day, with searching being refreshed once a day. The site actually runs ModPerl with Matt Sergeant's LibXML and LibXSLT modules. Elizabeth Mattijsen Note: I am the main developer of this website, so I am prejudiced ;-)
RE: [request] modperl mailing lists searchable archives wanted
> > I've just updated the archives list at > http://perl.apache.org/#maillists, so here is what we have: > > dev@@perl.apache.org - 2.5, but their search engines suck > [EMAIL PROTECTED] - none > [EMAIL PROTECTED] - none > [EMAIL PROTECTED] - none > [EMAIL PROTECTED] - 1 as far as I know, nobody is archiving [EMAIL PROTECTED] either, which is also of interest to us mod_perl folks :) --Geoff
Re: [request] modperl mailing lists searchable archives wanted
Hi Stas, I just updated the search site for Apache.org with a newer version of swish. The context highlighting is a bit silly, but that can be fixed. I'm only caching the first 15K of text from each page for context highlighting. http://search.apache.org It seems reasonably fast (it's not running under mod_perl currently, but could -- if mod_perl was in that server ;). It takes about eight or nine minutes to reindex ~35,000 docs on *.apache.org so the mod_perl list (and others) shouldn't too much trouble, I'd think, with smaller numbers and smaller content. It doesn't do incremental indexing at this point, which is a draw back, but indexing is so fast it normally doesn't matter (and there's an easy work-around for something like a mailing list to pickup new messages as they come in during the day). Swish-e can also call a perl program which feeds docs to swish. That makes it easy to parse the email into fields for something like: http://swish-e.org/Discussion/search/swish.cgi which looks a lot like the Apache search site... But, what would be needed is a good threaded mail archiver, which there are many to pick from, I'd expect. >Some >archives are browsable, but their search engines simply suck. e.g. >marc.theaimsgroup.com I think is the only one that archives >[EMAIL PROTECTED], but if you try to seach for perl string like >APR::Table::FETCH it won't find anything. If you search for >get_dir_config it will split it into 'get', 'dir', 'config' and give you >a zillion matches when you know that there are just a few. On swish you could say ":" and "_" are part of words and those would index as full words. Or, just simply search for phrase: "get_dir_config" and it would search for the phrase "get dir config" which would probably find what you want. Maybe : and _ are ok in words, but you have to think carefully about others. It's more flexible to split the words and use phrases in many cases. Bill Moseley mailto:[EMAIL PROTECTED]