Re: Search Engine for a CakePHP app
On Apr 30, 2007, at 9:50 PM, Gonzalo Servat wrote: > On 5/1/07, John David Anderson (_psychic_) <[EMAIL PROTECTED]> > wrote: > > On Apr 30, 2007, at 9:19 PM, Gonzalo Servat wrote: > > I created a search engine using a few classes from the Zend > "Framework." They've got a nice port of the guts of Lucene, and its > pretty easy to create your own search component. > > My content is almost completely in static view templates, so I > created a script that uses wget to pull down the content, and some > ZF classes to plug it into the index. > > Thanks for your reply John. Would you be able to provide more info > on this? I'd be interested to know what logic you used to write the > script that wget's the content, and if you have the ZF classes > handy, that would rock too :) Want me to deliver some dinner too? :) Here's a censored copy of my crawler script (/app/webroot/crawl.php). This is a copy of the app/webroot/index.php file that I modified to run as a script. Its really easy to make cron scripts this way - the index.php file loads up the cake core, so using it as a template works nice. I plan to run it daily using cron/launchd on the production machine. After that is a copy of my search component (/app/controllers/ components/search.php). Both files assume that you have some Zend libs in a vendors (/vendors/zend/Zend and /vendors/zend/Zend.php is how I have it set up). I don't need to provide those: they're freely available from Zend's website. Just make sure you wash your hands after handling. The normal disclaimers apply: This is a first run try on this code, and hasn't really been tested much. If you have suggestions or questions, feel free to send me gifts and/or bribes. I hope it helps you rather than deletes the contents of your disk and spreads your personal information on the Internet, but you'll have to assume some risks on using this code, as I can't really guarantee it yet. :) Happy baking, -- John 0) { //Remove the timestamp $parts = preg_split('/^\-\-\d+:\d+:\d+\-\-\s+http:\/\//', $line); //Remove surrounding whitespace and the site base URL $urls[] = str_replace($url, '', trim($parts[1])); } } //Re-create the Lucene search index rmdir($index_path); $index = Zend_Search_Lucene::create($index_path); //Add each document to the new index foreach($urls as $path) { $link = $path; //wget saves directory indexes as .html files... if(substr($link, -1, 1) == '/') { $path = $link . 'index.html'; } $doc_content = file_get_contents($download_path . DS . $url . $path); $doc = Zend_Search_Lucene_Document_Html::loadHTMLFile($doc_content); $doc->addField(Zend_Search_Lucene_Field::Text('url', $link)); $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $doc_content)); $index->addDocument($doc); //echo "Document added. URL: $link CONTENT: " . strlen ($doc_content) . " chars\n"; } $index->optimize(); $doc_size = $index->count(); $elapsed = number_format(microtime(true) - $start, 2); echo "Crawl complete. Indexed $doc_size documents in $elapsed seconds.\n"; ?> controller = $controller; //Construct the index object $this->index = Zend_Search_Lucene::open($index_path); } function execute($query) { //Perform a basic query $hits = $this->index->find($query); //For each hit, retreive the originating URL foreach($hits as $hit) { $doc = $hit->getDocument(); $hit->url = $doc->getFieldValue('url'); $hit->title = $doc->getFieldValue('title'); $hit->body = $doc->getFieldValue('body'); } return $hits; } } ?> --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Cake PHP" group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
Well, once you get the inner workings figured out here's an interesting approach from a UI standpoint: http://link.toolbot.com/dbachrach.com/76372 On May 1, 12:42 am, "Dr. Tarique Sani" <[EMAIL PROTECTED]> wrote: > On 5/1/07, John David Anderson (_psychic_) <[EMAIL PROTECTED]> wrote: > > > > > On Apr 30, 2007, at 10:21 PM, Mariano Iglesias wrote: > > > Zend Framework? HERESY! > > They will be assimilated. > > Oh! they will say that "It is by design" ;) > > Cheers > Tarique > > -- > = > PHP for E-Biz:http://sanisoft.com > Cheesecake-Photoblog needs you!:http://cheesecake-photoblog.org > = --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Cake PHP" group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
On 5/1/07, John David Anderson (_psychic_) <[EMAIL PROTECTED]> wrote: > > > On Apr 30, 2007, at 10:21 PM, Mariano Iglesias wrote: > > > > Zend Framework? HERESY! > They will be assimilated. Oh! they will say that "It is by design" ;) Cheers Tarique -- = PHP for E-Biz: http://sanisoft.com Cheesecake-Photoblog needs you!: http://cheesecake-photoblog.org = --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Cake PHP" group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
On Apr 30, 2007, at 10:21 PM, Mariano Iglesias wrote: > Zend Framework? HERESY! They will be assimilated. Resistance is futile. All it took was about 60 lines. That is all. -- John > -MI > > -- > - > > Remember, smart coders answer ten questions for every question they > ask. > So be smart, be cool, and share your knowledge. > > BAKE ON! > > blog: http://www.MarianoIglesias.com.ar > > De: cake-php@googlegroups.com [mailto:[EMAIL PROTECTED] En > nombre de John David Anderson (_psychic_) > Enviado el: Martes, 01 de Mayo de 2007 12:31 a.m. > Para: cake-php@googlegroups.com > Asunto: Re: Search Engine for a CakePHP app > > > > I created a search engine using a few classes from the Zend > "Framework." They've got a nice port of the guts of Lucene, and its > pretty easy to create your own search component. > > > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Cake PHP" group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
RE: Search Engine for a CakePHP app
Zend Framework? HERESY! -MI --- Remember, smart coders answer ten questions for every question they ask. So be smart, be cool, and share your knowledge. BAKE ON! blog: http://www.MarianoIglesias.com.ar _ De: cake-php@googlegroups.com [mailto:[EMAIL PROTECTED] En nombre de John David Anderson (_psychic_) Enviado el: Martes, 01 de Mayo de 2007 12:31 a.m. Para: cake-php@googlegroups.com Asunto: Re: Search Engine for a CakePHP app I created a search engine using a few classes from the Zend "Framework." They've got a nice port of the guts of Lucene, and its pretty easy to create your own search component. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Cake PHP" group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
On 5/1/07, John David Anderson (_psychic_) <[EMAIL PROTECTED]> wrote: > > > On Apr 30, 2007, at 9:19 PM, Gonzalo Servat wrote: > > I created a search engine using a few classes from the Zend "Framework." > They've got a nice port of the guts of Lucene, and its pretty easy to create > your own search component. > > My content is almost completely in static view templates, so I created a > script that uses wget to pull down the content, and some ZF classes to plug > it into the index. > Thanks for your reply John. Would you be able to provide more info on this? I'd be interested to know what logic you used to write the script that wget's the content, and if you have the ZF classes handy, that would rock too :) - Gonzalo --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Cake PHP" group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
On Apr 30, 2007, at 9:19 PM, Gonzalo Servat wrote: > Hi All, > > So I've gotten to a point in my app that I need to implement some > sort of (basic) search engine functionality. It wouldn't be that > hard to do if all the content was housed in database tables (as I > could do something similar to what gwoo suggested in http:// > groups.google.com/group/cake-php/browse_thread/thread/ > d94d6521b70e6e09/b68f0389f18b8c5e?lnk=gst&q=search > +engine&rnum=6#b68f0389f18b8c5e ) but my main problem is that a > fair bit of content is found in view files (under app/views > with .html extension to differentiate from .thtml files as the > latter often contain forms and stuff that shouldn't be searchable). > I thought about maybe doing a grep on any file under app/views with > a .html extension for the search term entered, but it's a pretty > hacky way of doing it (and wouldn't scale well if the site got > busy), so, apart from doing my own indexing, can anyone suggest a > way I can achieve this? I've had a search around but couldn't find > any cakebaker/bakery articles on a scenario similar to mine. I created a search engine using a few classes from the Zend "Framework." They've got a nice port of the guts of Lucene, and its pretty easy to create your own search component. My content is almost completely in static view templates, so I created a script that uses wget to pull down the content, and some ZF classes to plug it into the index. -- John > > Thanks in advance! > > - Gonzalo > > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Cake PHP" group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---