Re: Search Engine for a CakePHP app

2007-05-01 Thread John David Anderson (_psychic_)

On Apr 30, 2007, at 9:50 PM, Gonzalo Servat wrote:

> On 5/1/07, John David Anderson (_psychic_) <[EMAIL PROTECTED]>  
> wrote:
>
> On Apr 30, 2007, at 9:19 PM, Gonzalo Servat wrote:
>
> I created a search engine using a few classes from the Zend  
> "Framework." They've got a nice port of the guts of Lucene, and its  
> pretty easy to create your own search component.
>
> My content is almost completely in static view templates, so I  
> created a script that uses wget to pull down the content, and some  
> ZF classes to plug it into the index.
>
> Thanks for your reply John. Would you be able to provide more info  
> on this? I'd be interested to know what logic you used to write the  
> script that wget's the content, and if you have the ZF classes  
> handy, that would rock too :)

Want me to deliver some dinner too?

:)

Here's a censored copy of my crawler script (/app/webroot/crawl.php).  
This is a copy of the app/webroot/index.php file that I modified to  
run as a script. Its really easy to make cron scripts this way - the  
index.php file loads up the cake core, so using it as a template  
works nice. I plan to run it daily using cron/launchd on the  
production machine.

After that is a copy of my search component (/app/controllers/ 
components/search.php). Both files assume that you have some Zend  
libs in a vendors (/vendors/zend/Zend and /vendors/zend/Zend.php is  
how I have it set up).  I don't need to provide those: they're freely  
available from Zend's website. Just make sure you wash your hands  
after handling.

The normal disclaimers apply: This is a first run try on this code,  
and hasn't really been tested much. If you have suggestions or  
questions, feel free to send me gifts and/or bribes. I hope it helps  
you rather than deletes the contents of your disk and spreads your  
personal information on the Internet, but you'll have to assume some  
risks on using this code, as I can't really guarantee it yet. :)

Happy baking,

-- John


 0)
{
//Remove the timestamp
$parts = 
preg_split('/^\-\-\d+:\d+:\d+\-\-\s+http:\/\//', $line);

//Remove surrounding whitespace and the site base URL
$urls[] = str_replace($url, '', trim($parts[1]));
}
}

//Re-create the Lucene search index
rmdir($index_path);
$index = Zend_Search_Lucene::create($index_path);

//Add each document to the new index
foreach($urls as $path)
{
$link = $path;

//wget saves directory indexes as .html files...
if(substr($link, -1, 1) == '/')
{
$path = $link . 'index.html';
}

$doc_content = file_get_contents($download_path . DS . $url . 
$path);
$doc = 
Zend_Search_Lucene_Document_Html::loadHTMLFile($doc_content);
$doc->addField(Zend_Search_Lucene_Field::Text('url', $link));
$doc->addField(Zend_Search_Lucene_Field::UnStored('contents',  
$doc_content));

$index->addDocument($doc);

//echo "Document added. URL: $link CONTENT: " . strlen 
($doc_content) . " chars\n";
}

$index->optimize();

$doc_size = $index->count();
$elapsed = number_format(microtime(true) - $start, 2);

echo "Crawl complete. Indexed $doc_size documents in $elapsed  
seconds.\n";

?>






controller = $controller;

//Construct the index object
$this->index = Zend_Search_Lucene::open($index_path);
}

function execute($query)
{
//Perform a basic query
$hits = $this->index->find($query);

//For each hit, retreive the originating URL
foreach($hits as $hit)
{
$doc = $hit->getDocument();
$hit->url = $doc->getFieldValue('url');
$hit->title = $doc->getFieldValue('title');
$hit->body = $doc->getFieldValue('body');
}

return $hits;
}
}

?>



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: Search Engine for a CakePHP app

2007-05-01 Thread tracyfloyd

Well, once you get the inner workings figured out here's an
interesting approach from a UI standpoint:
http://link.toolbot.com/dbachrach.com/76372



On May 1, 12:42 am, "Dr. Tarique Sani" <[EMAIL PROTECTED]> wrote:
> On 5/1/07, John David Anderson (_psychic_) <[EMAIL PROTECTED]> wrote:
>
>
>
> > On Apr 30, 2007, at 10:21 PM, Mariano Iglesias wrote:
>
> > Zend Framework? HERESY!
> > They will be assimilated.
>
> Oh! they will say that "It is by design" ;)
>
> Cheers
> Tarique
>
> --
> =
> PHP for E-Biz:http://sanisoft.com
> Cheesecake-Photoblog needs you!:http://cheesecake-photoblog.org
> =


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: Search Engine for a CakePHP app

2007-04-30 Thread Dr. Tarique Sani

On 5/1/07, John David Anderson (_psychic_) <[EMAIL PROTECTED]> wrote:
>
>
> On Apr 30, 2007, at 10:21 PM, Mariano Iglesias wrote:
>
>
>
> Zend Framework? HERESY!
> They will be assimilated.

Oh! they will say that "It is by design" ;)

Cheers
Tarique

-- 
=
PHP for E-Biz: http://sanisoft.com
Cheesecake-Photoblog needs you!: http://cheesecake-photoblog.org
=

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: Search Engine for a CakePHP app

2007-04-30 Thread John David Anderson (_psychic_)

On Apr 30, 2007, at 10:21 PM, Mariano Iglesias wrote:

> Zend Framework? HERESY!
They will be assimilated.

Resistance is futile.

All it took was about 60 lines.

That is all.

-- John


> -MI
>
> -- 
> -
>
> Remember, smart coders answer ten questions for every question they  
> ask.
> So be smart, be cool, and share your knowledge.
>
> BAKE ON!
>
> blog: http://www.MarianoIglesias.com.ar
>
> De: cake-php@googlegroups.com [mailto:[EMAIL PROTECTED] En  
> nombre de John David Anderson (_psychic_)
> Enviado el: Martes, 01 de Mayo de 2007 12:31 a.m.
> Para: cake-php@googlegroups.com
> Asunto: Re: Search Engine for a CakePHP app
>
>
>
> I created a search engine using a few classes from the Zend  
> "Framework." They've got a nice port of the guts of Lucene, and its  
> pretty easy to create your own search component.
>
>
> >
>


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



RE: Search Engine for a CakePHP app

2007-04-30 Thread Mariano Iglesias
Zend Framework? HERESY!

-MI

---

Remember, smart coders answer ten questions for every question they ask. 
So be smart, be cool, and share your knowledge. 

BAKE ON!

blog: http://www.MarianoIglesias.com.ar

  _  

De: cake-php@googlegroups.com [mailto:[EMAIL PROTECTED] En nombre
de John David Anderson (_psychic_)
Enviado el: Martes, 01 de Mayo de 2007 12:31 a.m.
Para: cake-php@googlegroups.com
Asunto: Re: Search Engine for a CakePHP app

 

I created a search engine using a few classes from the Zend "Framework."
They've got a nice port of the guts of Lucene, and its pretty easy to create
your own search component.



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: Search Engine for a CakePHP app

2007-04-30 Thread Gonzalo Servat
On 5/1/07, John David Anderson (_psychic_) <[EMAIL PROTECTED]> wrote:
>
>
> On Apr 30, 2007, at 9:19 PM, Gonzalo Servat wrote:
>
> I created a search engine using a few classes from the Zend "Framework."
> They've got a nice port of the guts of Lucene, and its pretty easy to create
> your own search component.
>
> My content is almost completely in static view templates, so I created a
> script that uses wget to pull down the content, and some ZF classes to plug
> it into the index.
>

Thanks for your reply John. Would you be able to provide more info on this?
I'd be interested to know what logic you used to write the script that
wget's the content, and if you have the ZF classes handy, that would rock
too :)

- Gonzalo

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: Search Engine for a CakePHP app

2007-04-30 Thread John David Anderson (_psychic_)

On Apr 30, 2007, at 9:19 PM, Gonzalo Servat wrote:

> Hi All,
>
> So I've gotten to a point in my app that I need to implement some  
> sort of (basic) search engine functionality. It wouldn't be that  
> hard to do if all the content was housed in database tables (as I  
> could do something similar to what gwoo suggested in http:// 
> groups.google.com/group/cake-php/browse_thread/thread/ 
> d94d6521b70e6e09/b68f0389f18b8c5e?lnk=gst&q=search 
> +engine&rnum=6#b68f0389f18b8c5e ) but my main problem is that a  
> fair bit of content is found in view files (under app/views  
> with .html extension to differentiate from .thtml files as the  
> latter often contain forms and stuff that shouldn't be searchable).  
> I thought about maybe doing a grep on any file under app/views with  
> a .html extension for the search term entered, but it's a pretty  
> hacky way of doing it (and wouldn't scale well if the site got  
> busy), so, apart from doing my own indexing, can anyone suggest a  
> way I can achieve this? I've had a search around but couldn't find  
> any cakebaker/bakery articles on a scenario similar to mine.

I created a search engine using a few classes from the Zend  
"Framework." They've got a nice port of the guts of Lucene, and its  
pretty easy to create your own search component.

My content is almost completely in static view templates, so I  
created a script that uses wget to pull down the content, and some ZF  
classes to plug it into the index.

-- John

>
> Thanks in advance!
>
> - Gonzalo
>
>
> >


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---