on 3/12/03 5:45 PM, Nicholas Fitzgerald at [EMAIL PROTECTED] appended the following bits to my mbox:
> is that entire prog as it now exists. Notice I have NOT configured it as > yet to into the next level. I did this on purpose so I wouldn't have to > kill it in the middle of operation and potencially scew stuff up. They > way it is now, it looks at all the records in the database, updates them > if necessary, then extracts all the links and puts them into the > database for crawling on the next run through. Once I get this working > I'll put a big loop in it so it keeps going until there's nothing left > to look at. Meanwhile, if anyone sees anything in here that could be the > cause of this problem please let me know! I don't think I've found the problem, but I thought I'd point out a couple things: > // Open the database and start looking at URLs > $sql = mysql_query("SELECT * FROM search"); > while($rslt = mysql_fetch_array($sql)){ > $url = $rslt["url"]; The above line gets all the data from the table and then starts looping through... > // Put the stuff in the search database > $puts = mysql_query("SELECT * FROM search WHERE url='$url'"); > $site = mysql_fetch_array($puts); > $nurl = $site["url"]; > $ncrc = $site["checksum"]; > $ndate = $site["date"]; > if($ndate <= $daycheck || $ncrc != $checksum){ That line does the same query again for this particular URL to set variables in the $site array, though you already have this info in the $rslt array. You could potentially save hundreds of queries there. > // Get the page title > $temp = stristr($read,"<title>"); <snip> > $tchn = ($tend - $tpos); > $title = strip_tags(substr($read, ($tpos+7),$tchn)); Aside: Interesting way of doing things. I usually just preg_match these things, but I like this too. > // Kill any trailing slashes > if(substr($link,(strlen($link)-1)) == "/"){ > $link = substr($link,0,(strlen($link)-1)); > } Why are you killing the trailing slashes? That's going to cause fopen double the work to get to the pages. That is, first it will request the page without the slash, then get a redirect response with the slash, and then request the page again. > // Put the new URL in the search database > $chk = mysql_query("SELECT * FROM search WHERE url = '$link'"); > $curec = mysql_fetch_array($chk); > if(!$curec){ > echo "Adding: $link\n"; > $putup = mysql_query("INSERT INTO search SET url='$link'"); > } > else{ > continue; > } You might want to give a different variable name to the "new link", or encapsulate the above in a function, so your $link variables don't clobber each other. >> indicate where the chokepoint might be. It seems to be when the DB >> reaches a certain size, but 300 or so records should be a piece of cake >> for it. As far as the debug backtrace, there really isn't anything there >> that stands out. It's not an issue with a variable, something is going >> wrong in the execution either of php, or a sql query. I'm not finding >> any errors in the mysql error log, or anywhere else. What url is it dieing on? You could probably echo each $url to the terminal to watch it's progression and see where it is stopping. I've had problems with apache using custom php error docs where the error doc contained a php generated image that wasn't found. Each image that failed would generate another PHP error which cascaded until the server basically died. KIND OF BROADER ASIDE REGARDING SEARCH ENGINE PROBLEMS: I've also had recursion problems because php allows any characters to be appended after the request. For example, let's say you have an examples.php file and for some reason you have a relative link in examples.php to examples/somefile.html. If the examples directory doesn't exist, apache will serve examples.php to the user using the request of examples/somefile.html. A recursive search engine (that isn't too smart, i.e., infoseek and excite for colleges), will keep requesting things like: http://example.com/examples/examples/examples/examples/examples/examples/exa mples/examples/examples/examples/examples/examples/examples/examples/example s/examples/examples/examples/examples/examples/examples/examples/examples/ex amples/examples/examples/examples/examples/examples/examples/examples/exampl es/examples/examples/examples/examples/examples/examples/examples/examples/e xamples/examples/somefile.html As far as apache is concerned, it is fulfilling the request with the examples.php file and php just sees a really long query_string starting with /examples. I'm sure that isn't your problem, but I've been bit by it a few times. END OF ASIDE Hope some of that ramble helps. Please try to see if it is dieing on a particular URL so we can be of further assistance. Sincerely, Paul Burney <http://paulburney.com/> Q: Tired of creating admin interfaces to your MySQL web applications? A: Use MySTRI instead. Version 3.1 now available. <http://mystri.sourceforge.net/> -- PHP Database Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php