RE: search engine for the Guide
Hi, Try http://www.comptek.ru/yandex/YandexFree.html search engine for web servers with highlighting searched words... -Original Message- From: Stas Bekman [mailto:[EMAIL PROTECTED]] Sent: Thursday, May 04, 2000 2:10 PM To: Matt Sergeant Cc: mod_perl list Subject: Re: search engine for the Guide On Thu, 4 May 2000, Stas Bekman wrote: On Wed, 3 May 2000, Matt Sergeant wrote: On Wed, 3 May 2000, Stas Bekman wrote: Yeah, I've been thinking about it. There was one site that has offered me to provide a good search engine and they did, but the problem is that they didn't keep up with new releases, so people were searching the outdated version, which is quite bad -- I've removed the reference to it, after asking them to update their copy for a few months, with no results. Can't we use WWW::Search - If I recall correctly some of the sites can be restricted to a domain, so you could build a search interface pretty easily. DESCRIPTION : This class is the parent for all access methods supported by the WWW::Search library. This library implements a Perl API to web-based search engines. It's not the search engine -- it's a Perl API to the search engines. We need a search engine not the API to it. Did I miss something? Yes. On some of the search engines (AltaVista springs to mind) you can search for things on particular web sites, or even links to particular web sites. So as long as AltaVista keeps its search contents up to date, you can leverage their engine. IIRC either Randall or Lincoln did a WebTechniques article about this a few months ago. Oh, I see. But I want to stress these 2 points: 1) Currently each chapter in the Guide is a huge document, so doing search and having a hit, doesn't really help as you still have to go thru the page to find the exact section that you want to read. So I think we want a search engine that's not working with the master version per se, but with a copy which has name anchors for each line and: a. can bring you to exact line with match b. have the keyword highlighted 2) Most of the search engines have problems with keywords including non-alpha chars, like if you search for Apache::Registry you will end up searching for Apache and Registry since :: is ignored. Now think about '$r-print' 'BEGIN {', '$@', etc. All these are must for the doc with many non-alpha characters which should be searched for. What do you think? __ Stas Bekman | JAm_pH--Just Another mod_perl Hacker http://stason.org/ | mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] | http://perl.orghttp://stason.org/TULARC/ http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org --
RE: search engine for the Guide
On Fri, 23 Jun 2000, Vladislav Safronov wrote: Hi, Try http://www.comptek.ru/yandex/YandexFree.html search engine for web servers with highlighting searched words... Heh, it helps when you know Russian and have the font installed :) Oh, I see the English link: http://www.comptek.ru:8100/english/yandex/YandexFree.html Thanks Vladislav, but I think we are already quite happy with the two new engines provided by Randy and Vivek and the new version of the split guide, I really like it :). BTW, Randy's engine highlightes the words. -Original Message- From: Stas Bekman [mailto:[EMAIL PROTECTED]] Sent: Thursday, May 04, 2000 2:10 PM To: Matt Sergeant Cc: mod_perl list Subject: Re: search engine for the Guide On Thu, 4 May 2000, Stas Bekman wrote: On Wed, 3 May 2000, Matt Sergeant wrote: On Wed, 3 May 2000, Stas Bekman wrote: Yeah, I've been thinking about it. There was one site that has offered me to provide a good search engine and they did, but the problem is that they didn't keep up with new releases, so people were searching the outdated version, which is quite bad -- I've removed the reference to it, after asking them to update their copy for a few months, with no results. Can't we use WWW::Search - If I recall correctly some of the sites can be restricted to a domain, so you could build a search interface pretty easily. DESCRIPTION : This class is the parent for all access methods supported by the WWW::Search library. This library implements a Perl API to web-based search engines. It's not the search engine -- it's a Perl API to the search engines. We need a search engine not the API to it. Did I miss something? Yes. On some of the search engines (AltaVista springs to mind) you can search for things on particular web sites, or even links to particular web sites. So as long as AltaVista keeps its search contents up to date, you can leverage their engine. IIRC either Randall or Lincoln did a WebTechniques article about this a few months ago. Oh, I see. But I want to stress these 2 points: 1) Currently each chapter in the Guide is a huge document, so doing search and having a hit, doesn't really help as you still have to go thru the page to find the exact section that you want to read. So I think we want a search engine that's not working with the master version per se, but with a copy which has name anchors for each line and: a. can bring you to exact line with match b. have the keyword highlighted 2) Most of the search engines have problems with keywords including non-alpha chars, like if you search for Apache::Registry you will end up searching for Apache and Registry since :: is ignored. Now think about '$r-print' 'BEGIN {', '$@', etc. All these are must for the doc with many non-alpha characters which should be searched for. What do you think? __ Stas Bekman | JAm_pH--Just Another mod_perl Hacker http://stason.org/ | mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] | http://perl.orghttp://stason.org/TULARC/ http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org -- _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://perl.org http://stason.org/TULARC http://singlesheaven.com http://perlmonth.com http://sourcegarden.org
Re: search engine for the Guide
At 01:28 PM 5/4/00 +0300, Stas Bekman wrote: Two things: 1) I'd better concentrate on improving the content and structure of the Guide and will leave this search engine task to someone who needs to use the Guide but find it unusable without the proper search engine. 2) perl.apache.org doesn't have mod_perl installed, so it's better to use some other site. I don't have any. I'd be happy to host a search engine on my site. I'm not prepared to write one from scratch though, so if anyone has any suggestions of what the best 'off-the-shelf' solutions are I'd love to hear. One option is to use Google. Have a look at this link http://www.google.com/search?q=cache:perl.apache.org/guide/performance.html+%22mod_perl+guide%22hl=en (put it on one line, of course). It highlights the searched terms ('mod_perl' 'guide' in this case). Google doesn't allow searches within a site. However, if the same unique string were placed on each page of the guide, adding that string to the search query would only return hits from the guide. I think the best way to do this would be to create a custom search page that links to Google, and automatically includes the unique string in the request. Of course, if there are any free search tools that provide this functionality and come with source, that would be even better!
Re: search engine for the Guide
On Thu, 4 May 2000, Gunther Birznieks wrote: I would think that apache.org would provide a free open source search engine as an infrastructural resource? Can't we take advantage of that? Or is perl.apache.org not actually part of apache.org infrastructure? It seems to me that a lot more apache.org sites would benefit rather than perl.apache.org so it could be a shared resource. As for the programmatic characters not being recognized-- I think we kind of get used to that. It is a pain, but better a poor search than no search. I have actually wished for searching myself for just simple things here and there. Gunther, the engine is there. Look at the bottom of /guide/index.html -- and it's the one that apache.org serves all the projects with. So it exists but it's not as good as it could be, and it searchs the whole perl.apache.org site. That's another thing that makes the full print out version (PDF) so nice is that you can do a Find At 01:28 PM 5/4/00 +0300, Stas Bekman wrote: On Thu, 4 May 2000, Matt Sergeant wrote: On Thu, 4 May 2000, Stas Bekman wrote: Yes. On some of the search engines (AltaVista springs to mind) you can search for things on particular web sites, or even links to particular web sites. So as long as AltaVista keeps its search contents up to date, you can leverage their engine. IIRC either Randall or Lincoln did a WebTechniques article about this a few months ago. Oh, I see. But I want to stress these 2 points: 1) Currently each chapter in the Guide is a huge document, so doing search and having a hit, doesn't really help as you still have to go thru the page to find the exact section that you want to read. So I think we want a search engine that's not working with the master version per se, but with a copy which has name anchors for each line and: a. can bring you to exact line with match b. have the keyword highlighted 2) Most of the search engines have problems with keywords including non-alpha chars, like if you search for Apache::Registry you will end up searching for Apache and Registry since :: is ignored. Now think about '$r-print' 'BEGIN {', '$@', etc. All these are must for the doc with many non-alpha characters which should be searched for. What do you think? You seem to have it all worked out. I look forward to seeing your search engine ;-) Seriously though, I have a search engine in the works, however I don't know how well it will apply to your scheme above. It looks like you're going to be better off writing one yourself. Its not too hard, provided you have a DB to store the index on. Let me know if you need some pointers. Two things: 1) I'd better concentrate on improving the content and structure of the Guide and will leave this search engine task to someone who needs to use the Guide but find it unusable without the proper search engine. 2) perl.apache.org doesn't have mod_perl installed, so it's better to use some other site. I don't have any. Which leads to: If you suffer from inability to get the best out of the Guide in the shortest time and wish to help others in the same boat, please create a searchable mirror site which answers on the above demands. You will get a monument while you are alive at the perl.apache.org site if you are looking for one, but of course the most important is a great feeling of giving something back and not just taking. Hmm, may be we should run another contest at the Perl conference. The name is 'find it in the Guide'. You will be given a number of unanswered posts from the mod_perl list and the first one that provides pointers in the Guide that solve these problems wins. This has a double effect: 1) You get the prize 2) You finally answer the unanswered questions :) Have a good day, folks! __ Stas Bekman | JAm_pH--Just Another mod_perl Hacker http://stason.org/ | mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] | http://perl.orghttp://stason.org/TULARC/ http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org -- __ Stas Bekman | JAm_pH--Just Another mod_perl Hacker http://stason.org/ | mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] | http://perl.orghttp://stason.org/TULARC/ http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org --
Re: search engine for the Guide
On Fri, 5 May 2000, Jeremy Howard wrote: At 01:28 PM 5/4/00 +0300, Stas Bekman wrote: Two things: 1) I'd better concentrate on improving the content and structure of the Guide and will leave this search engine task to someone who needs to use the Guide but find it unusable without the proper search engine. 2) perl.apache.org doesn't have mod_perl installed, so it's better to use some other site. I don't have any. I'd be happy to host a search engine on my site. I'm not prepared to write one from scratch though, so if anyone has any suggestions of what the best 'off-the-shelf' solutions are I'd love to hear. Thanks! Let's figure out what's the best one we want first :) One option is to use Google. Have a look at this link http://www.google.com/search?q=cache:perl.apache.org/guide/performance.html+%22mod_perl+guide%22hl=en (put it on one line, of course). It highlights the searched terms ('mod_perl' 'guide' in this case). Google doesn't allow searches within a site. However, if the same unique string were placed on each page of the guide, adding that string to the search query would only return hits from the guide. I think the best way to do this would be to create a custom search page that links to Google, and automatically includes the unique string in the request. Yeah, it's a nice trick. The thing that defeats it a search engine, is it's freshness. We cannot tell google to rehash the Guide when there is a new version, and searching the outdated version is a bad idea. Of course, if there are any free search tools that provide this functionality and come with source, that would be even better! So far, from the personal replies to me, htdig is the best solution given that we stuff many anchors in the text so you could jump directly to the right paragraph. Keep on these ideas/tries coming. Thanks! __ Stas Bekman | JAm_pH--Just Another mod_perl Hacker http://stason.org/ | mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] | http://perl.orghttp://stason.org/TULARC/ http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org --
Re: search engine for the Guide (was Re: Why does $r-print()...)
On Wed, 3 May 2000, Matt Sergeant wrote: On Wed, 3 May 2000, Stas Bekman wrote: Yeah, I've been thinking about it. There was one site that has offered me to provide a good search engine and they did, but the problem is that they didn't keep up with new releases, so people were searching the outdated version, which is quite bad -- I've removed the reference to it, after asking them to update their copy for a few months, with no results. Can't we use WWW::Search - If I recall correctly some of the sites can be restricted to a domain, so you could build a search interface pretty easily. DESCRIPTION : This class is the parent for all access methods supported by the WWW::Search library. This library implements a Perl API to web-based search engines. It's not the search engine -- it's a Perl API to the search engines. We need a search engine not the API to it. Did I miss something? __ Stas Bekman | JAm_pH--Just Another mod_perl Hacker http://stason.org/ | mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] | http://perl.orghttp://stason.org/TULARC/ http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org --
Re: search engine for the Guide
On Thu, 4 May 2000, Stas Bekman wrote: On Wed, 3 May 2000, Matt Sergeant wrote: On Wed, 3 May 2000, Stas Bekman wrote: Yeah, I've been thinking about it. There was one site that has offered me to provide a good search engine and they did, but the problem is that they didn't keep up with new releases, so people were searching the outdated version, which is quite bad -- I've removed the reference to it, after asking them to update their copy for a few months, with no results. Can't we use WWW::Search - If I recall correctly some of the sites can be restricted to a domain, so you could build a search interface pretty easily. DESCRIPTION : This class is the parent for all access methods supported by the WWW::Search library. This library implements a Perl API to web-based search engines. It's not the search engine -- it's a Perl API to the search engines. We need a search engine not the API to it. Did I miss something? Yes. On some of the search engines (AltaVista springs to mind) you can search for things on particular web sites, or even links to particular web sites. So as long as AltaVista keeps its search contents up to date, you can leverage their engine. IIRC either Randall or Lincoln did a WebTechniques article about this a few months ago. Oh, I see. But I want to stress these 2 points: 1) Currently each chapter in the Guide is a huge document, so doing search and having a hit, doesn't really help as you still have to go thru the page to find the exact section that you want to read. So I think we want a search engine that's not working with the master version per se, but with a copy which has name anchors for each line and: a. can bring you to exact line with match b. have the keyword highlighted 2) Most of the search engines have problems with keywords including non-alpha chars, like if you search for Apache::Registry you will end up searching for Apache and Registry since :: is ignored. Now think about '$r-print' 'BEGIN {', '$@', etc. All these are must for the doc with many non-alpha characters which should be searched for. What do you think? __ Stas Bekman | JAm_pH--Just Another mod_perl Hacker http://stason.org/ | mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] | http://perl.orghttp://stason.org/TULARC/ http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org --
Re: search engine for the Guide
On Thu, 4 May 2000, Matt Sergeant wrote: On Thu, 4 May 2000, Stas Bekman wrote: Yes. On some of the search engines (AltaVista springs to mind) you can search for things on particular web sites, or even links to particular web sites. So as long as AltaVista keeps its search contents up to date, you can leverage their engine. IIRC either Randall or Lincoln did a WebTechniques article about this a few months ago. Oh, I see. But I want to stress these 2 points: 1) Currently each chapter in the Guide is a huge document, so doing search and having a hit, doesn't really help as you still have to go thru the page to find the exact section that you want to read. So I think we want a search engine that's not working with the master version per se, but with a copy which has name anchors for each line and: a. can bring you to exact line with match b. have the keyword highlighted 2) Most of the search engines have problems with keywords including non-alpha chars, like if you search for Apache::Registry you will end up searching for Apache and Registry since :: is ignored. Now think about '$r-print' 'BEGIN {', '$@', etc. All these are must for the doc with many non-alpha characters which should be searched for. What do you think? You seem to have it all worked out. I look forward to seeing your search engine ;-) Seriously though, I have a search engine in the works, however I don't know how well it will apply to your scheme above. It looks like you're going to be better off writing one yourself. Its not too hard, provided you have a DB to store the index on. Let me know if you need some pointers. Two things: 1) I'd better concentrate on improving the content and structure of the Guide and will leave this search engine task to someone who needs to use the Guide but find it unusable without the proper search engine. 2) perl.apache.org doesn't have mod_perl installed, so it's better to use some other site. I don't have any. Which leads to: If you suffer from inability to get the best out of the Guide in the shortest time and wish to help others in the same boat, please create a searchable mirror site which answers on the above demands. You will get a monument while you are alive at the perl.apache.org site if you are looking for one, but of course the most important is a great feeling of giving something back and not just taking. Hmm, may be we should run another contest at the Perl conference. The name is 'find it in the Guide'. You will be given a number of unanswered posts from the mod_perl list and the first one that provides pointers in the Guide that solve these problems wins. This has a double effect: 1) You get the prize 2) You finally answer the unanswered questions :) Have a good day, folks! __ Stas Bekman | JAm_pH--Just Another mod_perl Hacker http://stason.org/ | mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] | http://perl.orghttp://stason.org/TULARC/ http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org --
Re: search engine for the Guide (was Re: Why does $r-print()...)
On Thu, 4 May 2000, Stas Bekman wrote: On Wed, 3 May 2000, Matt Sergeant wrote: On Wed, 3 May 2000, Stas Bekman wrote: Yeah, I've been thinking about it. There was one site that has offered me to provide a good search engine and they did, but the problem is that they didn't keep up with new releases, so people were searching the outdated version, which is quite bad -- I've removed the reference to it, after asking them to update their copy for a few months, with no results. Can't we use WWW::Search - If I recall correctly some of the sites can be restricted to a domain, so you could build a search interface pretty easily. DESCRIPTION : This class is the parent for all access methods supported by the WWW::Search library. This library implements a Perl API to web-based search engines. It's not the search engine -- it's a Perl API to the search engines. We need a search engine not the API to it. Did I miss something? Yes. On some of the search engines (AltaVista springs to mind) you can search for things on particular web sites, or even links to particular web sites. So as long as AltaVista keeps its search contents up to date, you can leverage their engine. IIRC either Randall or Lincoln did a WebTechniques article about this a few months ago. -- Matt/ Fastnet Software Ltd. High Performance Web Specialists Providing mod_perl, XML, Sybase and Oracle solutions Email for training and consultancy availability. http://sergeant.org http://xml.sergeant.org
Re: search engine for the Guide (was Re: Why does $r-print()...)
On May 04, 2000 at 10:37:05 +0100, Matt Sergeant twiddled the keys to say: On Thu, 4 May 2000, Stas Bekman wrote: On Wed, 3 May 2000, Matt Sergeant wrote: On Wed, 3 May 2000, Stas Bekman wrote: Yeah, I've been thinking about it. There was one site that has offered me to provide a good search engine and they did, but the problem is that they didn't keep up with new releases, so people were searching the outdated version, which is quite bad -- I've removed the reference to it, after asking them to update their copy for a few months, with no results. Can't we use WWW::Search - If I recall correctly some of the sites can be restricted to a domain, so you could build a search interface pretty easily. DESCRIPTION : This class is the parent for all access methods supported by the WWW::Search library. This library implements a Perl API to web-based search engines. It's not the search engine -- it's a Perl API to the search engines. We need a search engine not the API to it. Did I miss something? Yes. On some of the search engines (AltaVista springs to mind) you can search for things on particular web sites, or even links to particular web sites. So as long as AltaVista keeps its search contents up to date, you can leverage their engine. IIRC either Randall or Lincoln did a WebTechniques article about this a few months ago. Leveraging the existing engines is a good idea, if you have fairly static content. If you're a moving target though, I'm of the opinion that it would be better to have an "in-house" engine. The existing engines are "nice" and don't invade your site too intrusively, so you end up with references which tend to lag current content. How to deal with that is not something I've thought about. The one time I tried an engine I ended up with a forking agrep, which comes as part of the Glimpse package. Glimpse itself sucks (in my opinion), but agrep works pretty good. The problem with it is you have to fork, which of course sucks. Oh well. My .02 :) Rick Myers[EMAIL PROTECTED] The Feynman Problem 1) Write down the problem. Solving Algorithm 2) Think real hard. 3) Write down the answer.
Re: search engine for the Guide
I would think that apache.org would provide a free open source search engine as an infrastructural resource? Can't we take advantage of that? Or is perl.apache.org not actually part of apache.org infrastructure? It seems to me that a lot more apache.org sites would benefit rather than perl.apache.org so it could be a shared resource. As for the programmatic characters not being recognized-- I think we kind of get used to that. It is a pain, but better a poor search than no search. I have actually wished for searching myself for just simple things here and there. That's another thing that makes the full print out version (PDF) so nice is that you can do a Find At 01:28 PM 5/4/00 +0300, Stas Bekman wrote: On Thu, 4 May 2000, Matt Sergeant wrote: On Thu, 4 May 2000, Stas Bekman wrote: Yes. On some of the search engines (AltaVista springs to mind) you can search for things on particular web sites, or even links to particular web sites. So as long as AltaVista keeps its search contents up to date, you can leverage their engine. IIRC either Randall or Lincoln did a WebTechniques article about this a few months ago. Oh, I see. But I want to stress these 2 points: 1) Currently each chapter in the Guide is a huge document, so doing search and having a hit, doesn't really help as you still have to go thru the page to find the exact section that you want to read. So I think we want a search engine that's not working with the master version per se, but with a copy which has name anchors for each line and: a. can bring you to exact line with match b. have the keyword highlighted 2) Most of the search engines have problems with keywords including non-alpha chars, like if you search for Apache::Registry you will end up searching for Apache and Registry since :: is ignored. Now think about '$r-print' 'BEGIN {', '$@', etc. All these are must for the doc with many non-alpha characters which should be searched for. What do you think? You seem to have it all worked out. I look forward to seeing your search engine ;-) Seriously though, I have a search engine in the works, however I don't know how well it will apply to your scheme above. It looks like you're going to be better off writing one yourself. Its not too hard, provided you have a DB to store the index on. Let me know if you need some pointers. Two things: 1) I'd better concentrate on improving the content and structure of the Guide and will leave this search engine task to someone who needs to use the Guide but find it unusable without the proper search engine. 2) perl.apache.org doesn't have mod_perl installed, so it's better to use some other site. I don't have any. Which leads to: If you suffer from inability to get the best out of the Guide in the shortest time and wish to help others in the same boat, please create a searchable mirror site which answers on the above demands. You will get a monument while you are alive at the perl.apache.org site if you are looking for one, but of course the most important is a great feeling of giving something back and not just taking. Hmm, may be we should run another contest at the Perl conference. The name is 'find it in the Guide'. You will be given a number of unanswered posts from the mod_perl list and the first one that provides pointers in the Guide that solve these problems wins. This has a double effect: 1) You get the prize 2) You finally answer the unanswered questions :) Have a good day, folks! __ Stas Bekman | JAm_pH--Just Another mod_perl Hacker http://stason.org/ | mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] | http://perl.orghttp://stason.org/TULARC/ http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org --