Re: Search-Engine: Extend results with random records if there are no 'real' results
you are right, it doesn't make sense, but we've to do it that it looks better. it's a search-engine for a specific sector so every result will statisfy the user (a little bit). therefore i wanted to do it like that. does anyone know how? (with pagination) thank you! On 27 Sep., 18:45, brian bally.z...@gmail.com wrote: On Sun, Sep 27, 2009 at 7:52 AM, braaan martin.platt...@gmail.com wrote: On 25 Sep., 20:28, brian bally.z...@gmail.com wrote: On Fri, Sep 25, 2009 at 8:47 AM, braaan martin.platt...@gmail.com wrote: i can't use ORDER BY RAND() on every page because then the user would get different results on every refresh. Well, it wouldn't be very random, otherwise, would it? you are right, i meant not a new ORDER BY RAND() query on every result- page. example: results on page 1 will change when i change the page to 2 and then back to 1 again. that's not what i want. i thought about doing the data-fetching myself (andextendif there are to less results) and give it to the pagination manually, is this possible? What do you mean by doing the data-fetching myself and manually? what i want to do: 1) search for the users string myself 2) if result-count X, then add some random results. 3) use this resultset (real results + applicable random results) and do pagination with exactly this result-set. (not doing a RAND()-query on every page-change.) do you know what i mean? Yes, that makes sense. I suppose you could save the query in the session. But, why do you want to show random results in the first place? Is this just for testing? If so, it would be simpler to use dummy data for now. It doesn't seem tome that showing random results in production would be useful. If a search query comes up empty, it's empty. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search-Engine: Extend results with random records if there are no 'real' results
On 25 Sep., 20:28, brian bally.z...@gmail.com wrote: On Fri, Sep 25, 2009 at 8:47 AM, braaan martin.platt...@gmail.com wrote: i can't use ORDER BY RAND() on every page because then the user would get different results on every refresh. Well, it wouldn't be very random, otherwise, would it? you are right, i meant not a new ORDER BY RAND() query on every result- page. example: results on page 1 will change when i change the page to 2 and then back to 1 again. that's not what i want. i thought about doing the data-fetching myself (andextendif there are to less results) and give it to the pagination manually, is this possible? What do you mean by doing the data-fetching myself and manually? what i want to do: 1) search for the users string myself 2) if result-count X, then add some random results. 3) use this resultset (real results + applicable random results) and do pagination with exactly this result-set. (not doing a RAND()-query on every page-change.) do you know what i mean? greets, thank you --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search-Engine: Extend results with random records if there are no 'real' results
On Sun, Sep 27, 2009 at 7:52 AM, braaan martin.platt...@gmail.com wrote: On 25 Sep., 20:28, brian bally.z...@gmail.com wrote: On Fri, Sep 25, 2009 at 8:47 AM, braaan martin.platt...@gmail.com wrote: i can't use ORDER BY RAND() on every page because then the user would get different results on every refresh. Well, it wouldn't be very random, otherwise, would it? you are right, i meant not a new ORDER BY RAND() query on every result- page. example: results on page 1 will change when i change the page to 2 and then back to 1 again. that's not what i want. i thought about doing the data-fetching myself (andextendif there are to less results) and give it to the pagination manually, is this possible? What do you mean by doing the data-fetching myself and manually? what i want to do: 1) search for the users string myself 2) if result-count X, then add some random results. 3) use this resultset (real results + applicable random results) and do pagination with exactly this result-set. (not doing a RAND()-query on every page-change.) do you know what i mean? Yes, that makes sense. I suppose you could save the query in the session. But, why do you want to show random results in the first place? Is this just for testing? If so, it would be simpler to use dummy data for now. It doesn't seem tome that showing random results in production would be useful. If a search query comes up empty, it's empty. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search-Engine: Extend results with random records if there are no 'real' results
On Fri, Sep 25, 2009 at 8:47 AM, braaan martin.platt...@gmail.com wrote: i can't use ORDER BY RAND() on every page because then the user would get different results on every refresh. Well, it wouldn't be very random, otherwise, would it? i thought about doing the data-fetching myself (and extend if there are to less results) and give it to the pagination manually, is this possible? What do you mean by doing the data-fetching myself and manually? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search engine like function?
check out http://bakery.cakephp.org/articles/view/sphinx-behavior and see if it fits your need? On May 1, 9:00 am, mbourque mpbour...@ptc.com wrote: I am building a data warehouse site that contains a lot of customer info. I need to allow users of my app to have a search page that lets them enter natural search strings to find and narrow search results. I wanted each word on the search field to be treated as AND clause in my search. I also need the search to act like full text where multiple fields are considered for a match. I also have a special case whereas a numeric keyword is treated special and only certain fields are searched in that case. Examples: Keyword: dave Searches: ( `User`.`name` like '%dave%' OR `User`.`email` like '%dave%' ) Keyword: dave thomas Searches: ( ( `User`.`name` like '%dave%' OR `User`.`email` like '%dave%' ) AND ( `name` like '% thomas%' OR `email` like '% thomas%' ) ) Keyword: 341 Searches: ( `User`.`id` = '341' ) I have written a model function that divides the keywords into an array and creates an array of conditions that I pass to the find() command. This works great. However this feels like a very common pattern, and I wonder if I just reinvented the wheel? Sooner or later I will be asked to add special keyword handlers such as: dave OR thomas dave AND thomas dave thomas dave thomas OR david thomas Is there already a cake pattern or helper that exists that I should be using? If not I may just create one for the good of the community. -- View this message in context:http://n2.nabble.com/Search-engine-like-function--tp2753945p2753945.html Sent from the CakePHP mailing list archive at Nabble.com. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: search engine bots and sessions
Most search engine bots/crawlers don't store cookies when they are crawling your site. A way to solve this is creating a list with known bots and their useragent strings. If a visitor visits your site check if it's a bot or not and if so look in your sessions table if it has been here before so you can reuse that session. Bare in mind that this solves the problem for most bots, however there are spam bots that act the same and are harder to track since they tend to use 'normal' browser useragent strings. On Mar 13, 9:35 am, wowfka a.lic...@gmail.com wrote: Hi, Have little prob with search engine bots :) I am storing sessions in database and also track visitors in site, with records from that database, recently i saw multiple records with same IP adress tracked it, and found that it is search engine bots, google,yahoo, etc there was many records with same ip each url generates seperate session id. It should behave so? each bot acess to url create new session id? Maybe i missing something. Another question is it good solution to track users from cake session database? I can create another database and store there visiting users information, but don't want to create unnecessary- dublicate code. Thanks --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: search engine bots and sessions
I am just tracking, currently visiting users last 30 min, so i can use session database, also increased session expirity a little. As i know cake deletes only expired sessions. On Mar 13, 12:01 pm, Braindead markus.he...@gmail.com wrote: Cake deletes expired entries from the session table automatically. Therefore using the session table to track users could lead to wrong figures. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: search engine bots and sessions
Yes thank your for suggestion, i was thinking about this solution also, just thought that cake take care of bots. On Mar 13, 12:04 pm, WyriHaximus webmas...@wyrihaximus.net wrote: Most search engine bots/crawlers don't store cookies when they are crawling your site. A way to solve this is creating a list with known bots and their useragent strings. If a visitor visits your site check if it's a bot or not and if so look in your sessions table if it has been here before so you can reuse that session. Bare in mind that this solves the problem for most bots, however there are spam bots that act the same and are harder to track since they tend to use 'normal' browser useragent strings. On Mar 13, 9:35 am, wowfka a.lic...@gmail.com wrote: Hi, Have little prob with search engine bots :) I am storing sessions in database and also track visitors in site, with records from that database, recently i saw multiple records with same IP adress tracked it, and found that it is search engine bots, google,yahoo, etc there was many records with same ip each url generates seperate session id. It should behave so? each bot acess to url create new session id? Maybe i missing something. Another question is it good solution to track users from cake session database? I can create another database and store there visiting users information, but don't want to create unnecessary- dublicate code. Thanks --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: search engine bots and sessions
Cake deletes expired entries from the session table automatically. Therefore using the session table to track users could lead to wrong figures. Multiple records for the same IP address are ok. Try to open your site with Firefox and IE at the same time. There will be 2 records for your IP address, because the two browsers don't share sessions. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: search engine bots and sessions
you can detect spider form UA string: @see http://user-agents.org $ua = env('HTTP_USER_AGENT'); if (strpos($ua, '+http') == true || strpos($ua, 'http') == true || strpos($ua, 'www.') == true || strpos($ua, '@') == true) { // i'm spider } On Mar 13, 11:14 am, wowfka a.lic...@gmail.com wrote: Yes thank your for suggestion, i was thinking about this solution also, just thought that cake take care of bots. On Mar 13, 12:04 pm, WyriHaximus webmas...@wyrihaximus.net wrote: Most search engine bots/crawlers don't store cookies when they are crawling your site. A way to solve this is creating a list with known bots and their useragent strings. If a visitor visits your site check if it's a bot or not and if so look in your sessions table if it has been here before so you can reuse that session. Bare in mind that this solves the problem for most bots, however there are spam bots that act the same and are harder to track since they tend to use 'normal' browser useragent strings. On Mar 13, 9:35 am, wowfka a.lic...@gmail.com wrote: Hi, Have little prob with search engine bots :) I am storing sessions in database and also track visitors in site, with records from that database, recently i saw multiple records with same IP adress tracked it, and found that it is search engine bots, google,yahoo, etc there was many records with same ip each url generates seperate session id. It should behave so? each bot acess to url create new session id? Maybe i missing something. Another question is it good solution to track users from cake session database? I can create another database and store there visiting users information, but don't want to create unnecessary- dublicate code. Thanks --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to cake-php+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
Hi Mike, If your using Apache it has some features in the htaccess file that will allow you to disable access to your server for bots causing you trouble. In your Cake 404 display page keep track of the number of times a 404 is generated per IP address, and if it exceeds a threshold log that IP address to a text file. Humans browsing a website will not generate many 404 messages, even if they have bad bookmarks, or follow old links from search engines. So an IP address requesting more then one hundred 404 errors is likely a problem bot. Each time a 404 page is display log the IP to a database with a counter. When the counter reaches your limit add that IP address to a text file. In your .htaccess you can load this text file of IP addresses and apply rules to those addresses. It's up to you if you wish to display a static access denied Html page, or simply throw a connection refused. Sorry I don't remember the commands for the htaccess file. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
I'd actually say using a permanent redirect (301, I believe) to your root (or that controller's index), rather than to the 404 page might be a better solution. If your users/visitors won't see it since you're not linking to it, it isn't really a bad solution, and I doubt you'd want any search engines indexing 404 errors in association with your site/domain. If it was a hacker, I don't think I'd send them a 404 message either, I'd just redirect them...if it was a Safari user, I'd rather give them a graceful degredation than a 404 just as well. That's just me though. Standard incorrect addresses should still receive a 404. A 404 does serve a very important purpose. On Nov 6, 9:00 am, Mathew [EMAIL PROTECTED] wrote: Hi Mike, If your using Apache it has some features in the htaccess file that will allow you to disable access to your server for bots causing you trouble. In your Cake 404 display page keep track of the number of times a 404 is generated per IP address, and if it exceeds a threshold log that IP address to a text file. Humans browsing a website will not generate many 404 messages, even if they have bad bookmarks, or follow old links from search engines. So an IP address requesting more then one hundred 404 errors is likely a problem bot. Each time a 404 page is display log the IP to a database with a counter. When the counter reaches your limit add that IP address to a text file. In your .htaccess you can load this text file of IP addresses and apply rules to those addresses. It's up to you if you wish to display a static access denied Html page, or simply throw a connection refused. Sorry I don't remember the commands for the htaccess file. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
I'd actually say using a permanent redirect (301, I believe) to your root (or that controller's index), rather than to the 404 page might be a better solution. If your users/visitors won't see it since you're not linking to it, it isn't really a bad solution, and I doubt you'd want any search engines indexing 404 errors in association with your site/domain. If it was a hacker, I don't think I'd send them a 404 message either, I'd just redirect them...if it was a Safari user, You should not redirect unless the content has been moved. Sending the wrong response codes to incorrect URIs makes it difficult for web crawl operators to correctly crawl your site. Should a web crawl operator come to the conclusion that your site provides incorrect response codes, then they might choose to crawl it aggressively since the server's responses can not be trusted. Indexing bots will not index a 404 response code from the Http header. That response code tells the bots the URI points to no content. Bots will only index pages when the 404 error message is sent with a Http 200 response code and a text/html content-type in the header, which is incorrect and more of an error on the server side then a problem with the bot. If you send a 301/302 response code you are telling the bot, this URI is valid, it has been moved, now the source URI and the redirected URI will continue to be processed by the bot. Where as if you tell the bot 404, then the bot knows this URI is invalid, the source page that URI comes from is generating invalid URIs, and it can drop other URIs from that source. Sending a hacker a 301, 302 does nothing to change their behavior, and provides them no extra information then a 404. Blocking a remote computer from making to many invalid requests from your server does change the behavior of that remote computer. It stops it. Which is about all you can do at this point. A hacker will return with a different IP address, and attack. So, hackers are a completely different topic :) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
It may not index a 404, but it still checks the 404. For usability's sake I'd still prefer to redirect than to send a 404. Although we were discussing bots, we have to keep the user in mind as well. I have personally traversed the URL path to see what may be found on some sites, and if Safari has the feature included out of the box, well...I'd rather present the user with something than nothing at all, and a 404 isn't my idea of proper degredation within the path. Either way, it's simply a matter of personal preference. Google was not the first search engine to incorporate robots.txt by the way...they were the first to incorporate the rel=nofollow and also I think the SiteMap.xml idea. On Nov 6, 12:05 pm, Mathew [EMAIL PROTECTED] wrote: I'd actually say using a permanent redirect (301, I believe) to your root (or that controller's index), rather than to the 404 page might be a better solution. If your users/visitors won't see it since you're not linking to it, it isn't really a bad solution, and I doubt you'd want any search engines indexing 404 errors in association with your site/domain. If it was a hacker, I don't think I'd send them a 404 message either, I'd just redirect them...if it was a Safari user, You should not redirect unless the content has been moved. Sending the wrong response codes to incorrect URIs makes it difficult for web crawl operators to correctly crawl your site. Should a web crawl operator come to the conclusion that your site provides incorrect response codes, then they might choose to crawl it aggressively since the server's responses can not be trusted. Indexing bots will not index a 404 response code from the Http header. That response code tells the bots the URI points to no content. Bots will only index pages when the 404 error message is sent with a Http 200 response code and a text/html content-type in the header, which is incorrect and more of an error on the server side then a problem with the bot. If you send a 301/302 response code you are telling the bot, this URI is valid, it has been moved, now the source URI and the redirected URI will continue to be processed by the bot. Where as if you tell the bot 404, then the bot knows this URI is invalid, the source page that URI comes from is generating invalid URIs, and it can drop other URIs from that source. Sending a hacker a 301, 302 does nothing to change their behavior, and provides them no extra information then a 404. Blocking a remote computer from making to many invalid requests from your server does change the behavior of that remote computer. It stops it. Which is about all you can do at this point. A hacker will return with a different IP address, and attack. So, hackers are a completely different topic :) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
Most web crawlers won't check a 404, because of the way servers send Http responses. When a crawler requests a page that is missing, it first receives the header response from the request, and it can read the response code, content-type, and other information. The web crawler can then stop the download of the content after it has checked the response code, reducing the bandwidth placed on the server, and reducing time the web crawler is spending on missing content. If a redirect response is sent, then the crawler must make another request to the server and will download the entire content of a page that does not reflect the source url. The web crawler will see a 200 response code on the new URI, download all the content, and increase the time and bandwidth spent crawling that domain. But I understand what your saying Brendon about it being a design choice. I'm just not sure traversing the URL path improves the visitors usability of the website their visiting. Once they step up to an invalid URI they will be redirected somewhere else, which would stop the traversal of the URL. Here's CNN as an example. http://edition.cnn.com/2008/POLITICS/11/06/middle.east.peace.deal/index.html http://edition.cnn.com/2008/POLITICS/11/06/middle.east.peace.deal http://edition.cnn.com/2008/POLITICS/11/06 http://edition.cnn.com/2008/POLITICS/11 http://edition.cnn.com/2008/POLITICS http://edition.cnn.com/2008 While these links will produce a 404 response and display Html. A web crawler will not download the content after it has rejected the response code in the header of the Http response. So the most bandwidth load placed on the server is a few bytes per bad URI. This makes your domain crawler friendly, but a friendly crawler would not request phantom URIs. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
Another reason not to use redirects for missing URIs is that you could mistakenly create what is called a crawler trap. A crawler trap are URLs that keep changing but keep producing the same content. The crawler gets stuck wasting its time download the same page, because it can't tell by the URL that the content is the same. While good crawlers have logic to prevent this problem from happening. Your site could be flagged as poorly structured, and commercial crawlers will avoid indexing your content. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
Thank you Matthew - I log it everytime before throwing the 404 and I figured whatever was creating these things would stop - but it continues. I'm so dadgum anal obsessive it just kills me - hard to ignore... It is not coming from any 'known' bot either... --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
Great advice mathew... Yes... i think that this is the way to go... point all /controller/action which dont mean any thing without an extra id to 404... once the crawler sees this 404 it would never try to fetch the same thing again. Thanks. On Sat, Nov 1, 2008 at 6:21 AM, Mathew [EMAIL PROTECTED] wrote: Hi Mike, Disallowing that in your robots.txt is a waste of time. The robots.txt file was started by Google, and is not an officially supported feature of all crawlers. So they don't have to follow it, and I can tell you this doesn't sound like the google bot anyway, because that bot doesn't generate phantom URIs. Web crawlers can extract URIs from many different sources, and they can generate URIs as they see fit. URIs can come from HTML, CSS, SWF, JavaScript, and form post/get actions. I've even seen crawlers submit post requests to generate more URIs to crawl. Crawlers will also clean URIs removing ids, changing queries, fake cookies, and sometimes rotate their IP address. There are no rules about crawlers, no guidelines they have to follow, or limits on how long they will crawl or how aggressively they will request URIs from your server. You should modify your Routes to point to a 404 if they request paths that you don't want them to see. -- Thanks Regards, Novice. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
So you're saying the search bots are just walking all my actions as if they are subdirs on a site? Not sure about this. Maybe I should disallow those specific requests with robots.txt? Any other cakers have an opinion on this? If I disallow www.mydomain.com/controller/action/ wont the bots stop walking all the actions? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
Hi Mike, Disallowing that in your robots.txt is a waste of time. The robots.txt file was started by Google, and is not an officially supported feature of all crawlers. So they don't have to follow it, and I can tell you this doesn't sound like the google bot anyway, because that bot doesn't generate phantom URIs. Web crawlers can extract URIs from many different sources, and they can generate URIs as they see fit. URIs can come from HTML, CSS, SWF, JavaScript, and form post/get actions. I've even seen crawlers submit post requests to generate more URIs to crawl. Crawlers will also clean URIs removing ids, changing queries, fake cookies, and sometimes rotate their IP address. There are no rules about crawlers, no guidelines they have to follow, or limits on how long they will crawl or how aggressively they will request URIs from your server. You should modify your Routes to point to a 404 if they request paths that you don't want them to see. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Bots Generating Strange Queries
I'm totally no expert on this, but I'd guess that the bots are simply trying to walk the tree. If http://mysite.com/directory/subdirectory/subsubdirectory; is valid, then http://mysite.com/directory/subdirectory;, http://mysite.com/directory and http://mysite.com; are probably also valid. The GOOG doesn't know that those directories don't actually exist. In classic web development patterns there should be an index.htm file in each of these directories, so it can't hurt to look for them. BTW: Safari (and possibly other browsers as well) allow you to right- click on the title bar and offer the same kind of URL shortening shortcuts in a popup menu. On 30 Oct 2008, at 15:02, MikeK wrote: In a general CMS app written in CakePHP I am noticing in my logs invalid queries being generated by various search engine bots including Google, Inktomi, and Yahoo. What I'm wondering is WHY? For example they are requesting http://mysite.com/controller/view instead of the correct http://mysite.com/controller/view/34 (ex: id 34) Nowhere on my site do I publish any links to /controllers/view without an id parm This is driving me slightly nuts. Why would a bot request a URI it has never seen? My validation code that checks for valid requests logs these occurences and every day I puzzle over my logs and examine the emitted web page source wondering where or why they are requesting these invalid URIs. I've been dumping $_SERVER and no clues there either. The referer is always '/'. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CakePHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
Well, once you get the inner workings figured out here's an interesting approach from a UI standpoint: http://link.toolbot.com/dbachrach.com/76372 On May 1, 12:42 am, Dr. Tarique Sani [EMAIL PROTECTED] wrote: On 5/1/07, John David Anderson (_psychic_) [EMAIL PROTECTED] wrote: On Apr 30, 2007, at 10:21 PM, Mariano Iglesias wrote: Zend Framework? HERESY! They will be assimilated. Oh! they will say that It is by design ;) Cheers Tarique -- = PHP for E-Biz:http://sanisoft.com Cheesecake-Photoblog needs you!:http://cheesecake-photoblog.org = --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
On Apr 30, 2007, at 9:50 PM, Gonzalo Servat wrote: On 5/1/07, John David Anderson (_psychic_) [EMAIL PROTECTED] wrote: On Apr 30, 2007, at 9:19 PM, Gonzalo Servat wrote: I created a search engine using a few classes from the Zend Framework. They've got a nice port of the guts of Lucene, and its pretty easy to create your own search component. My content is almost completely in static view templates, so I created a script that uses wget to pull down the content, and some ZF classes to plug it into the index. Thanks for your reply John. Would you be able to provide more info on this? I'd be interested to know what logic you used to write the script that wget's the content, and if you have the ZF classes handy, that would rock too :) Want me to deliver some dinner too? :) Here's a censored copy of my crawler script (/app/webroot/crawl.php). This is a copy of the app/webroot/index.php file that I modified to run as a script. Its really easy to make cron scripts this way - the index.php file loads up the cake core, so using it as a template works nice. I plan to run it daily using cron/launchd on the production machine. After that is a copy of my search component (/app/controllers/ components/search.php). Both files assume that you have some Zend libs in a vendors (/vendors/zend/Zend and /vendors/zend/Zend.php is how I have it set up). I don't need to provide those: they're freely available from Zend's website. Just make sure you wash your hands after handling. The normal disclaimers apply: This is a first run try on this code, and hasn't really been tested much. If you have suggestions or questions, feel free to send me gifts and/or bribes. I hope it helps you rather than deletes the contents of your disk and spreads your personal information on the Internet, but you'll have to assume some risks on using this code, as I can't really guarantee it yet. :) Happy baking, -- John ?php $start = microtime(true); /** * Do not change */ if (!defined('DS')) { define('DS', DIRECTORY_SEPARATOR); } /** * These defines should only be edited if you have cake installed in * a directory layout other than the way it is distributed. * Each define has a commented line of code that explains what you would change. */ if (!defined('ROOT')) { //define('ROOT', 'FULL PATH TO DIRECTORY WHERE APP DIRECTORY IS LOCATED. DO NOT ADD A TRAILING DIRECTORY SEPARATOR'); //You should also use the DS define to separate your directories define('ROOT', dirname(dirname(dirname(__FILE__; } if (!defined('APP_DIR')) { //define('APP_DIR', 'DIRECTORY NAME OF APPLICATION'); define('APP_DIR', basename(dirname(dirname(__FILE__; } /** * This only needs to be changed if the cake installed libs are located * outside of the distributed directory structure. */ if (!defined('CAKE_CORE_INCLUDE_PATH')) { //define ('CAKE_CORE_INCLUDE_PATH', 'FULL PATH TO DIRECTORY WHERE CAKE CORE IS INSTALLED. DO NOT ADD A TRAILING DIRECTORY SEPARATOR'); //You should also use the DS define to separate your directories define('CAKE_CORE_INCLUDE_PATH', ROOT); } /// //DO NOT EDIT BELOW THIS LINE// /// if (!defined('WEBROOT_DIR')) { define('WEBROOT_DIR', basename(dirname(__FILE__))); } if (!defined('WWW_ROOT')) { define('WWW_ROOT', dirname(__FILE__) . DS); } if (!defined('CORE_PATH')) { if (function_exists('ini_set')) { ini_set('include_path', CAKE_CORE_INCLUDE_PATH . PATH_SEPARATOR . ROOT . DS . APP_DIR . DS . PATH_SEPARATOR . ini_get ('include_path')); define('APP_PATH', null); define('CORE_PATH', null); } else { define('APP_PATH', ROOT . DS . APP_DIR . DS); define('CORE_PATH', CAKE_CORE_INCLUDE_PATH . DS); } } if (!include(CORE_PATH . 'cake' . DS . 'bootstrap.php')) { trigger_error(Can't find CakePHP core. Check the value of CAKE_CORE_INCLUDE_PATH in app/webroot/index.php. It should point to the directory containing your . DS . cake core directory and your . DS . vendors root directory. , E_USER_ERROR); } /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=-=-=-=-=-=-=-=-=-=-*/ //Add Zend libs to include path $include = ini_get('include_path'); $new_include = $include . ':' . VENDORS . 'zend'; ini_set('include_path', $new_include); //Include Zend_Search Classes vendor('zend' . DS . 'Zend' . DS . 'Search' . DS .
Re: Search Engine for a CakePHP app
On Apr 30, 2007, at 9:19 PM, Gonzalo Servat wrote: Hi All, So I've gotten to a point in my app that I need to implement some sort of (basic) search engine functionality. It wouldn't be that hard to do if all the content was housed in database tables (as I could do something similar to what gwoo suggested in http:// groups.google.com/group/cake-php/browse_thread/thread/ d94d6521b70e6e09/b68f0389f18b8c5e?lnk=gstq=search +enginernum=6#b68f0389f18b8c5e ) but my main problem is that a fair bit of content is found in view files (under app/views with .html extension to differentiate from .thtml files as the latter often contain forms and stuff that shouldn't be searchable). I thought about maybe doing a grep on any file under app/views with a .html extension for the search term entered, but it's a pretty hacky way of doing it (and wouldn't scale well if the site got busy), so, apart from doing my own indexing, can anyone suggest a way I can achieve this? I've had a search around but couldn't find any cakebaker/bakery articles on a scenario similar to mine. I created a search engine using a few classes from the Zend Framework. They've got a nice port of the guts of Lucene, and its pretty easy to create your own search component. My content is almost completely in static view templates, so I created a script that uses wget to pull down the content, and some ZF classes to plug it into the index. -- John Thanks in advance! - Gonzalo --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
On 5/1/07, John David Anderson (_psychic_) [EMAIL PROTECTED] wrote: On Apr 30, 2007, at 9:19 PM, Gonzalo Servat wrote: I created a search engine using a few classes from the Zend Framework. They've got a nice port of the guts of Lucene, and its pretty easy to create your own search component. My content is almost completely in static view templates, so I created a script that uses wget to pull down the content, and some ZF classes to plug it into the index. Thanks for your reply John. Would you be able to provide more info on this? I'd be interested to know what logic you used to write the script that wget's the content, and if you have the ZF classes handy, that would rock too :) - Gonzalo --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
RE: Search Engine for a CakePHP app
Zend Framework? HERESY! -MI --- Remember, smart coders answer ten questions for every question they ask. So be smart, be cool, and share your knowledge. BAKE ON! blog: http://www.MarianoIglesias.com.ar _ De: cake-php@googlegroups.com [mailto:[EMAIL PROTECTED] En nombre de John David Anderson (_psychic_) Enviado el: Martes, 01 de Mayo de 2007 12:31 a.m. Para: cake-php@googlegroups.com Asunto: Re: Search Engine for a CakePHP app I created a search engine using a few classes from the Zend Framework. They've got a nice port of the guts of Lucene, and its pretty easy to create your own search component. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
On Apr 30, 2007, at 10:21 PM, Mariano Iglesias wrote: Zend Framework? HERESY! They will be assimilated. Resistance is futile. All it took was about 60 lines. That is all. -- John -MI -- - Remember, smart coders answer ten questions for every question they ask. So be smart, be cool, and share your knowledge. BAKE ON! blog: http://www.MarianoIglesias.com.ar De: cake-php@googlegroups.com [mailto:[EMAIL PROTECTED] En nombre de John David Anderson (_psychic_) Enviado el: Martes, 01 de Mayo de 2007 12:31 a.m. Para: cake-php@googlegroups.com Asunto: Re: Search Engine for a CakePHP app I created a search engine using a few classes from the Zend Framework. They've got a nice port of the guts of Lucene, and its pretty easy to create your own search component. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine for a CakePHP app
On 5/1/07, John David Anderson (_psychic_) [EMAIL PROTECTED] wrote: On Apr 30, 2007, at 10:21 PM, Mariano Iglesias wrote: Zend Framework? HERESY! They will be assimilated. Oh! they will say that It is by design ;) Cheers Tarique -- = PHP for E-Biz: http://sanisoft.com Cheesecake-Photoblog needs you!: http://cheesecake-photoblog.org = --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~--~~~~--~~--~--~---
Re: Search Engine Optimization
Thanks AD. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---
Re: Search Engine Optimization
How about putting something like this in your layout: ?php if (!isset($description)) { $description = preg_replace (@/?[^]**@, , $content_for_layout); $description = preg_replace('/\s\s+/', ' ', $description); $description = substr($description, 0, 400); } ? meta name=description content = ?php echo $description ? / Therefore if the varaible $description isn't set, just use the first x characters of your page content. HTH, AD7six --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---
Re: Search Engine Optimization
Actually, Cake URLs are better optimized than standard webapp URLs because search engines seem to prefer path-based URLs to querystring-based URLs. Also, Cake's URLs are based on a routing system that supports regular expressions, so you can put URLs in whatever format you want. Just search this list for 'SEO,' and you'll see some interesting methods for how to optimize Cake-based URLs. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---
Re: Search Engine Optimization
Mod rewrite URLs are good for e-commercial sites especially, as you name pages according to the 'product/blog/news item' that they serve. So Google would come across: www.my_elephant_shop.com/shop/view/elephants (Which is very SEO index friendly and easier for users to remember). I have used mod rewrite URLs here: http:www.dubfrog.com To name pages after musicians etc: And can say it has been indexed well, but the URL's are not as good as Cake's. Which leads to my next project. I am looking at using Cake to do another shop and think the URL's and AJAX support will be ace I think the Search engines will index product pages even better as they won't even have a .html or .php extension. Cake also makes it easier to pick up the last '/requested_product' from the end of a URL - (in order to highlight slug text on the served page with Javascript for example). --- SEO Question You can also serve dynamic META tags using PHP so that your meta Keywords and Description tags match your served page's 'products / blogs / etc' I've done this dynamic META malarkey on non cake sites, but I don't know how to do it with Cake. I would like to as it has been successful. What would be the most conventional way to start getting REQUEST vars (about products) into the META tags please? ta, --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---
Re: Search Engine Optimization
You can use the super-cool HeadHelper (http://cakeforge.org/snippet/detail.php?type=snippetid=56) to dynamically put things in the header of your layout. See this thread for details: http://groups.google.com/group/cake-php/browse_thread/thread/53f34e06e26c3b59/20ef002542bf96ef?lnk=gstq=headhelperrnum=2#20ef002542bf96ef HeadHelper has a register_meta function you could use. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---
Re: Search Engine Optimization
no On 7/27/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: We're bidding on a website redesign and are heavily pitching the use of CakePHP. The client is very concerned about URLs like: www.site.com/page instead of www.site.com/page.html Are CakePHP URLs any more difficult for search engines to index than the standard page.html form? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---
Re: Search engine
You'd use it by creating a function in your model that makes a custom SQL call eg class Post extends AppModel { var $name = 'Post'; function posterFirstName() { $ret = $this-query(SELECT first_name FROM posters_table WHERE poster_id = 1); $firstName = $ret[0]['first_name']; return $firstName; } } --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---
Re: Search engine
Matt, you dont have to write so much SQL. class Post extends AppModel { var $name = 'Post'; function posterFirstName($poster_id) { $ret = $this-Post-findByPoster_id($poster_id, 'first_name'); $firstName = $ret[0]['first_name']; return $firstName; } } --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---
Re: Search engine
I just copied the example in the manual. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---
Re: Search engine
ok, it works... partly i know where is pace of ths and the rest but it doesn't search and doesn't show me the result. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Cake PHP group. To post to this group, send email to cake-php@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php -~--~~~~--~~--~--~---