Re: Guide search engine (was Re: multiple copies of a module)
On Wed, 17 May 2000, Jeremy Howard wrote: ...the perl.apache.org search facility * Where is it? (doing a Find on the front page doesn't show it) At the bottom of all guide pages. How funny--I'd never even noticed it! I see that it's using 'Swish-E' http://sunsite.berkeley.edu/SWISH-E/. Stas--did you get that up and running? Can we tailor it for our needs? Here's an attempt at listing what I think we've decided we should aim for: - Allow restriction of search to just the guide - Allow searching of other documents through a popup selection (probably make the guide the default?) - Highlight found words - Try and index in a way that suits programmers, not English writers. e.g. include @, %, $, ::, in indexed words. Have I missed anything? (I'm ignoring the docbook issue for the moment since it's not directly related, and I guess it's really Stas' call anyhow.) So far these are the engines that we are going to deplo: 1st search: Randy Kobes: swish engine + perl filters http://theoryx5.uwinnipeg.ca/cgi-bin/guide-search 2nd search: Vivek Khera: nextrieve engine http://thingy.kcilink.com/cgi-bin/modperlguide.cgi Both more or less cover the demands from my yours and mine wishlists. I'll link to these from the Guide. You are welcome to present other search engines if you think you can get a better one. I promise to link to all of them, assuming that you will take the responsibility to keep up with updates. I'll delete all the references to search engines which will not update their indexed version as I did before with some quite good search engine that didn't keep up with updates and had half a year old version, and users were using a *very* outdated guide as a result. I would have thought the best bet would be to put it on the footer of every perl.apache.org page. A popup which allows selecting a subset of the site might default to either 'whole site' or 'mod_perl Guide', or maybe it changes it's default to whatever part of the site is currently being viewed... The outstanding issues, I believe, are: - Who looks after the perl.apache.org search facility? Are they happy to expand its functionality as described? - What tool? Potential options so far are Swish-e, htdig, or custom Perl (perhaps based on Matt's engine). Any of these could be piped through a word-hilighting filter - What's the best 1st step? i.e. How can we get a simple search going quickly, while providing the foundation for a more complete system down the track? - Who's going to do the actual work? As I've mentioned, if a machine is required, I'm happy to provide it. However, I don't have the experience in this area to lead the work--although of course I'll contribute where I can! It would be nice to get a private mailing list going to avoid filling up this list too much more. Anyone who's interesting in getting involved, email me, and I'll ensure that I add your name to the list. You don't have to be a programming guru, of course... there's always plenty of ways to get involved in these things. Well, things are just happening. Vivek and Randy already created their versions and presented them at the list, received feedbacks, made corrections and have the engines working. As I've mentioned above you are very welcome to beat their achievement and get an even better engine :) P.S. Asking "who is going to do that" is a bad idea on this list... I'm not bitching, Just telling the fact. If you want something to be done either ask for help or do it yourself. _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://perl.org http://stason.org/TULARC http://singlesheaven.com http://perlmonth.com http://sourcegarden.org
Re: multiple copies of a module
At 11:18 AM 5/17/00 +0300, Stas Bekman wrote: On Wed, 17 May 2000, Gunther Birznieks wrote: I am curious as to why you don't care for 20 different apaches? If you use a mod_proxy front-end, it should be relatively easy to manage 20 different apache's on the backend, especially if you use variables to start them up. There is another command line parameter that can be used to trigger different code in the same conf file (so that they start on different ports for example)... In addition, a solid part of testing mod_perl modules consists of running in single process mode (ala -x parameter)) -- which is invaluable for finding cached code conflicts.. You won't be able to do this if everyone is using the same apache. BTW, this would be a good addition to the guide -- how to manage a mod_perl development environment with more than 1 developer (eg 20 in your case) Both questions are already answered in the guide: Kees' original: http://perl.apache.org/guide/modules.html#Apache_PerlVINC_set_a_differe Gunter's suggestion: http://perl.apache.org/guide/control.html#Starting_a_Personal_Server_for_E :) Excellent! One thing though is that both these suggestions actually tie together to solve a practical problem: Managing Multiple Developers yet are located in two different chapters. I think these sections belong where they do, but perhaps a separate discussion linking these sections (and any other relevant ones) would be useful. __ Gunther Birznieks ([EMAIL PROTECTED]) Extropia - The Web Technology Company http://www.extropia.com/
Re: multiple copies of a module
On Wed, 17 May 2000, Gunther Birznieks wrote: At 11:18 AM 5/17/00 +0300, Stas Bekman wrote: On Wed, 17 May 2000, Gunther Birznieks wrote: I am curious as to why you don't care for 20 different apaches? If you use a mod_proxy front-end, it should be relatively easy to manage 20 different apache's on the backend, especially if you use variables to start them up. There is another command line parameter that can be used to trigger different code in the same conf file (so that they start on different ports for example)... In addition, a solid part of testing mod_perl modules consists of running in single process mode (ala -x parameter)) -- which is invaluable for finding cached code conflicts.. You won't be able to do this if everyone is using the same apache. BTW, this would be a good addition to the guide -- how to manage a mod_perl development environment with more than 1 developer (eg 20 in your case) Both questions are already answered in the guide: Kees' original: http://perl.apache.org/guide/modules.html#Apache_PerlVINC_set_a_differe Gunter's suggestion: http://perl.apache.org/guide/control.html#Starting_a_Personal_Server_for_E :) Excellent! One thing though is that both these suggestions actually tie together to solve a practical problem: Managing Multiple Developers yet are located in two different chapters. I think these sections belong where they do, but perhaps a separate discussion linking these sections (and any other relevant ones) would be useful. These two are disconnected in fact. Since when you develop the code you need just a few processes per developer, you start a separate server for each developer -- and that's what the last link suggests. It shows how to use the front end machine to solve the problem when all the servers are running on the same machine. There is no reason of runing vurtual hosts for each developer, different ports solve this issue much better and simpler. Of course I'd be glad to document other approaches if you will to share. _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://perl.org http://stason.org/TULARC http://singlesheaven.com http://perlmonth.com http://sourcegarden.org
Re: multiple copies of a module
Stas Bekman wrote: Both questions are already answered in the guide: Kees' original: http://perl.apache.org/guide/modules.html#Apache_PerlVINC_set_a_differe Gunter's suggestion: http://perl.apache.org/guide/control.html#Starting_a_Personal_Server_for_E Thank you very much this is very helpful (I love the guide, even though it sometimes difficult to find something in it). I think would like to go the Apache::PerlVINC way, as our testing box is very heavily loaded (due to all kinds of non-apache testing going on). I would never get permission to start 20 apache servers. However the URL in the guide: http://perl.apache.org/~dougm/Apache-PerlVINC-0.01.tar.gz does not exist, is there any other place where I can find Apache::PerlVINC? Thanks, Kees
Re: multiple copies of a module
Stas Bekman wrote: Both questions are already answered in the guide: Kees' original: http://perl.apache.org/guide/modules.html#Apache_PerlVINC_set_a_differe Gunter's suggestion: http://perl.apache.org/guide/control.html#Starting_a_Personal_Server_for_E Thank you very much this is very helpful (I love the guide, even though it sometimes difficult to find something in it). Hold on, at this very moment a few mod_perl fellas are working on having a good search engine for the guide. Just give it some more time, I'm trying to bring the best so it'll take a while... I think would like to go the Apache::PerlVINC way, as our testing box is very heavily loaded (due to all kinds of non-apache testing going on). I would never get permission to start 20 apache servers. However the URL in the guide: http://perl.apache.org/~dougm/Apache-PerlVINC-0.01.tar.gz does not exist, is there any other place where I can find Apache::PerlVINC? Indeed, it's not there. Doug? _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://perl.org http://stason.org/TULARC http://singlesheaven.com http://perlmonth.com http://sourcegarden.org
Guide search engine (was Re: multiple copies of a module)
Stas Bekman wrote: Hold on, at this very moment a few mod_perl fellas are working on having a good search engine for the guide. Just give it some more time, I'm trying to bring the best so it'll take a while... I'm glad you brought this up again. Since I mentioned I'd be happy to host such a thing, and asked for suggestions, I've got a total of one (from Stas--thanks!). That suggestion was to use ht://dig http://www.htdig.org/. Has anyone got a search engine up and running that they're happy with? Stas has made the good point that it needs to be able to hilight found words, since the pages are quite large. If anyone has a chance to do a bit of research about (free) search engines, I'd really appreciate it if you could let me know what you find out. It'd be nice publicity if it was mod_perl based, I guess, but it doesn't really matter. My only concern is that it seems a little odd to keep this just to the Guide. Wouldn't it be useful for the rest of perl.apache.org? I wouldn't have thought it's much extra work to add a drop-down box to search specific areas of the sight (the Guide being one)... If there's a good reason to have the Guide's search engine separate to the rest of perl.apache.org, should it have a separate domain (modperlguide.org?, guide.perl.apache.org?)? -- Jeremy Howard [EMAIL PROTECTED]
Re: Guide search engine (was Re: multiple copies of a module)
BTW: Your email client is broken and not wrapping words. On Wed, 17 May 2000, Jeremy Howard wrote: Stas Bekman wrote: Hold on, at this very moment a few mod_perl fellas are working on having a good search engine for the guide. Just give it some more time, I'm trying to bring the best so it'll take a while... I'm glad you brought this up again. Since I mentioned I'd be happy to host such a thing, and asked for suggestions, I've got a total of one (from Stas--thanks!). That suggestion was to use ht://dig http://www.htdig.org/. While htdig is a reasonable engine, Stas's idea is this needs to be "guide specific". Meaning what I'm not sure, but I'm assuming it means to pick out only certain words to index... Has anyone got a search engine up and running that they're happy with? I just wrote a very simple SQL based engine - so I would say I'm happy with that. It's fast and it's all in perl. I could very simply rip out the search parts of the code for someone to play with if they wanted to. Stas has made the good point that it needs to be able to hilight found words, since the pages are quite large. If anyone has a chance to do a bit of research about (free) search engines, I'd really appreciate it if you could let me know what you find out. It'd be nice publicity if it was mod_perl based, I guess, but it doesn't really matter. I think word highlighting is overrated. It's only necessary in this case because the guide is so damn huge now. The size problem could be eliminated by making the guide split itself up into smaller sections. My proposal would be to do that by converting the guide to docbookXML and use AxKit to display the resulting docbook pages. The AxKit docbook stylesheets are nice and friendly, and written in Perl, not some obscure XML stylesheet language. And after all that, it would make converting the guide to a format O'Reilly likes to publish (i.e. docbook), trivial. My only concern is that it seems a little odd to keep this just to the Guide. Wouldn't it be useful for the rest of perl.apache.org? I wouldn't have thought it's much extra work to add a drop-down box to search specific areas of the sight (the Guide being one)... perl.apache.org already has a search engine. If there's a good reason to have the Guide's search engine separate to the rest of perl.apache.org, should it have a separate domain (modperlguide.org?, guide.perl.apache.org?)? guide.modperl.org ? -- Matt/ Fastnet Software Ltd. High Performance Web Specialists Providing mod_perl, XML, Sybase and Oracle solutions Email for training and consultancy availability. http://sergeant.org http://xml.sergeant.org
Re: Guide search engine (was Re: multiple copies of a module)
Jeremy Howard wrote: I'm glad you brought this up again. Since I mentioned I'd be happy to host such a thing, and asked for suggestions, I've got a total of one (from Stas--thanks!). That suggestion was to use ht://dig http://www.htdig.org/. Has anyone got a search engine up and running that they're happy with? Stas has made the good point that it needs to be able to hilight found words, since the pages are quite large. If anyone has a chance to do a bit of research about (free) search engines, I'd really appreciate it if you could let me know what you find out. It'd be nice publicity if it was mod_perl based, I guess, but it doesn't really matter. I'm happy with ht://dig, I use it mainly for looking up docs I've squirreled away in /manual. (instead of grep) It's been a while since I've been to htdig.org but I did grab a tarball recently, so I'm fairly confident there isn't* an existing mod_perl wrapper -- but maybe there should be. There are a number of perl scripts in the distribution, and I thought* there was a plain Perl wrapper, but I could be mistaken. I think a mod_perl frontend/wrapper could work well, that is, htsearch is about 900K+ and takes a moment to fire up (on my box anyway) -- how much worse could it be? OTOH, one could* (conceivably) get crazy and access the DB's directly and maybe XS any needed portions of htsearch (ambitious :-). However, this still leaves htdig, htfuzzy, htmerge, etc .. to handle the indexing. As far as highlighting, I have a piece of code I'm using -- we could use it as a starting point. Downside is it uses $` $' (it can probably be tweeked to avoid this), but it handles the critical stuff like skipping keywords within href's/tags, etc. RE: Matt Sergeant -- Perhaps highlighting is overrated, but it usually doesn't hurt. I too have a proprietary search facility, and a inverted indexing prototype (stores packed doc-id integers in MySQL, for example) -- but a great deal of work has gone into ht://dig .. My only concern is that it seems a little odd to keep this just to the Guide. Wouldn't it be useful for the rest of perl.apache.org? I wouldn't have thought it's much extra work to add a drop-down box to search specific areas of the sight (the Guide being one)... I'd have to agree there. If there's a good reason to have the Guide's search engine separate to the rest of perl.apache.org, should it have a separate domain (modperlguide.org?, guide.perl.apache.org?)? -- Jeremy Howard [EMAIL PROTECTED] ht://dig allows for the param 'restrict' = /to_this_directory .. which might be useful for seperating things. Count me in, whatever we choose. -Jay J # use Text::Wrapper;
Re: Guide search engine (was Re: multiple copies of a module)
Jeremy Howard wrote: I'm glad you brought this up again. Since I mentioned I'd be happy to host such a thing, and asked for suggestions, I've got a total of one (from Stas--thanks!). That suggestion was to use ht://dig http://www.htdig.org/. Has anyone got a search engine up and running that they're happy with? Stas has made the good point that it needs to be able to hilight found words, since the pages are quite large. If anyone has a chance to do a bit of research about (free) search engines, I'd really appreciate it if you could let me know what you find out. It'd be nice publicity if it was mod_perl based, I guess, but it doesn't really matter. I know this is absolute anathema, considering you guys are developers, but... Have you looked at www.atomz.com, at least as a temporary solution? (A free service for sites with fewer than 500 pages). Basically, the search brings up their page, but you can customize it to look just like one of yours. It truly is fast (as hell) and flexible, and it does highlight found words. Even does soundalikes in the absence of other matches. The result page will show their logo, though, but it's rather unobtrusive. (The biggest drawback, as a long-term solution, is that if you change the look of your pages, you have one more maintenance chore to do, in that you have to go over to atomz.com and change your result page there as well). O'Reilly uses it, if that helps! :-) Try this: http://search.atomz.com/search/?sp-a=0002078e-spsp-q=cgisp-k=Books (Looks for O'Reilly books pages containing 'cgi'). Yeah, I know, I'd rather roll my own, too, given time...
Re: Guide search engine (was Re: multiple copies of a module)
BTW: Your email client is broken and not wrapping words. I know--sorry. I'm fixing that this week. I'm just going through the RFCs to see exactly how to implement this right... (The email client is a web-based thing I've written in mod_perl--of course ;-) I just wrote a very simple SQL based engine - so I would say I'm happy with that. It's fast and it's all in perl. I could very simply rip out the search parts of the code for someone to play with if they wanted to. Sounds good. Personally, I'd rather a simple engine we can fiddle with ourselves than a big system written in C. Does your engine generate a database from flat files? Is there some basic parameterisation (a 'stop list' for common words, definable 'keyword' characters, ...)? I think word highlighting is overrated. It's only necessary in this case because the guide is so damn huge now. The size problem could be eliminated by making the guide split itself up into smaller sections. My proposal would be to do that by converting the guide to docbookXML and use AxKit to display the resulting docbook pages. The AxKit docbook stylesheets are nice and friendly, and written in Perl, not some obscure XML stylesheet language. And after all that, it would make converting the guide to a format O'Reilly likes to publish (i.e. docbook), trivial. Your word highlighting statement is, I suspect, controversial. On the other hand, converting to docbook is unlikely to meet much resistance from users--as long as Stas doesn't mind maintaining it!... To get the best of both worlds, why not simply chain the search engine result through a filter that does the highlighting. I bet someone's written such a filter already--anyone? My only concern is that it seems a little odd to keep this just to the Guide. Wouldn't it be useful for the rest of perl.apache.org? I wouldn't have thought it's much extra work to add a drop-down box to search specific areas of the sight (the Guide being one)... perl.apache.org already has a search engine. So I've heard, but: * Where is it? (doing a Find on the front page doesn't show it) * Does it do highlighting? * Can you select a subset of the site? (e.g. just the Guide) If there's a good reason to have the Guide's search engine separate to the rest of perl.apache.org, should it have a separate domain (modperlguide.org?, guide.perl.apache.org?)? guide.modperl.org ? Looks like modperl.org is taken: Domain Name: MODPERL.ORG Registrar: NETWORK SOLUTIONS, INC. Whois Server: whois.networksolutions.com Referral URL: www.networksolutions.com Name Server: DNS2.BASCOM.COM Name Server: DNS.THAKKAR.NET Updated Date: 24-nov-1999 They're not using it though--maybe they would transfer? Probably better to stick in the perl.apache.org domain though. BTW, thanks to everyone who's already responded privately to my renewed request. Keep it up! -- Jeremy Howard [EMAIL PROTECTED]
Re: Guide search engine (was Re: multiple copies of a module)
On Wed, 17 May 2000, Jeremy Howard wrote: I just wrote a very simple SQL based engine - so I would say I'm happy with that. It's fast and it's all in perl. I could very simply rip out the search parts of the code for someone to play with if they wanted to. Sounds good. Personally, I'd rather a simple engine we can fiddle with ourselves than a big system written in C. Does your engine generate a database from flat files? Is there some basic parameterisation (a 'stop list' for common words, definable 'keyword' characters, ...)? Well it's just perl, so there's a separate word tokenizer, a separate db inserter and a separate searcher (which is split into query parser and SQL builder). The db inserter is aware of "ignore words" which are stored in the DB. I think word highlighting is overrated. It's only necessary in this case because the guide is so damn huge now. The size problem could be eliminated by making the guide split itself up into smaller sections. My proposal would be to do that by converting the guide to docbookXML and use AxKit to display the resulting docbook pages. The AxKit docbook stylesheets are nice and friendly, and written in Perl, not some obscure XML stylesheet language. And after all that, it would make converting the guide to a format O'Reilly likes to publish (i.e. docbook), trivial. Your word highlighting statement is, I suspect, controversial. On the other hand, converting to docbook is unlikely to meet much resistance from users--as long as Stas doesn't mind maintaining it!... To get the best of both worlds, why not simply chain the search engine result through a filter that does the highlighting. I bet someone's written such a filter already--anyone? My only concern is that it seems a little odd to keep this just to the Guide. Wouldn't it be useful for the rest of perl.apache.org? I wouldn't have thought it's much extra work to add a drop-down box to search specific areas of the sight (the Guide being one)... perl.apache.org already has a search engine. So I've heard, but: * Where is it? (doing a Find on the front page doesn't show it) At the bottom of all guide pages. * Does it do highlighting? No. * Can you select a subset of the site? (e.g. just the Guide) No. -- Matt/ Fastnet Software Ltd. High Performance Web Specialists Providing mod_perl, XML, Sybase and Oracle solutions Email for training and consultancy availability. http://sergeant.org http://xml.sergeant.org
Re: Guide search engine (was Re: multiple copies of a module)
At 11:19 17/05/2000 -0500, Jeremy Howard wrote: Your word highlighting statement is, I suspect, controversial. On the other hand, converting to docbook is unlikely to meet much resistance from users--as long as Stas doesn't mind maintaining it!... To get the best of both worlds, why not simply chain the search engine result through a filter that does the highlighting. I bet someone's written such a filter already--anyone? I haven't played with it, but getting docbook out of the guide should be as easy as using Pod::DocBook. Fwiw, there's also been some work done on coming up with an xpod dtd, but I don't know how far it's advanced. .Robin To err is human, to purr feline.
Re: Guide search engine (was Re: multiple copies of a module)
On Wed, 17 May 2000, Robin Berjon wrote: At 11:19 17/05/2000 -0500, Jeremy Howard wrote: Your word highlighting statement is, I suspect, controversial. On the other hand, converting to docbook is unlikely to meet much resistance from users--as long as Stas doesn't mind maintaining it!... To get the best of both worlds, why not simply chain the search engine result through a filter that does the highlighting. I bet someone's written such a filter already--anyone? I haven't played with it, but getting docbook out of the guide should be as easy as using Pod::DocBook. Fwiw, there's also been some work done on coming up with an xpod dtd, but I don't know how far it's advanced. I've played with Pod::DocBook, and it's a good start, but uses the DocBook SGML DTD, so you can't process it with XML tools. It also doesn't support =over =item =back, which is a pretty major limitation, IMHO. However patching it to support that shouldn't be too hard. -- Matt/ Fastnet Software Ltd. High Performance Web Specialists Providing mod_perl, XML, Sybase and Oracle solutions Email for training and consultancy availability. http://sergeant.org http://xml.sergeant.org
Re: Guide search engine (was Re: multiple copies of a module)
I know I'm late to this party, but I thought I'd point out a couple of options: - The Search::InvertedIndex module on CPAN (uses dbm files, I think). - The DBIx::TextIndex module on CPAN (uses MySQL). - The WAIT module on CPAN (uses dbm files). - Glimpse: http://webglimpse.org/. - Swish++: http://www.best.com/~pjl/software/swish/ (no, it's not the same one apache.org is using). I've also had great success with htdig. Maybe I'll try spidering the guide with and see how it does. - Perrin
Re: Guide search engine (was Re: multiple copies of a module)
...the perl.apache.org search facility * Where is it? (doing a Find on the front page doesn't show it) At the bottom of all guide pages. How funny--I'd never even noticed it! I see that it's using 'Swish-E' http://sunsite.berkeley.edu/SWISH-E/. Stas--did you get that up and running? Can we tailor it for our needs? Here's an attempt at listing what I think we've decided we should aim for: - Allow restriction of search to just the guide - Allow searching of other documents through a popup selection (probably make the guide the default?) - Highlight found words - Try and index in a way that suits programmers, not English writers. e.g. include @, %, $, ::, in indexed words. Have I missed anything? (I'm ignoring the docbook issue for the moment since it's not directly related, and I guess it's really Stas' call anyhow.) I would have thought the best bet would be to put it on the footer of every perl.apache.org page. A popup which allows selecting a subset of the site might default to either 'whole site' or 'mod_perl Guide', or maybe it changes it's default to whatever part of the site is currently being viewed... The outstanding issues, I believe, are: - Who looks after the perl.apache.org search facility? Are they happy to expand its functionality as described? - What tool? Potential options so far are Swish-e, htdig, or custom Perl (perhaps based on Matt's engine). Any of these could be piped through a word-hilighting filter - What's the best 1st step? i.e. How can we get a simple search going quickly, while providing the foundation for a more complete system down the track? - Who's going to do the actual work? As I've mentioned, if a machine is required, I'm happy to provide it. However, I don't have the experience in this area to lead the work--although of course I'll contribute where I can! It would be nice to get a private mailing list going to avoid filling up this list too much more. Anyone who's interesting in getting involved, email me, and I'll ensure that I add your name to the list. You don't have to be a programming guru, of course... there's always plenty of ways to get involved in these things. -- Jeremy Howard [EMAIL PROTECTED]
Re: multiple copies of a module
On Wed, 17 May 2000, Kees Vonk 7249 24549 wrote: However the URL in the guide: http://perl.apache.org/~dougm/Apache-PerlVINC-0.01.tar.gz does not exist, is there any other place where I can find Apache::PerlVINC? it's there now. and, a reminder from when it was first posted, it's not on cpan because it needs a maintainer (besides me :), anybody interested?