Hi, There are 2 points here: 1. In our repository, we have configured to allow crawler to browser our site by putting a robot.txt with only one line : User-agent: * I have checked with webmaster tools and it reports that the crawler access was success. Anyway, I am not quite sure that should be OK. The problem is that internal error messages are being sent to me everyday saying that the crawler cannot access certain pages. I have checked the handles attached and found that those are non-existent pages... Can any of you please suggest what I should do to get rid of this kind of errors ?
2. I also submited sitemaps to Google, the latest result reported in webmaster tools is: Sitemap: http://kb.psu.ac.th/psukb/sitemap Status: OK Type: Index Submitted: 17/7/2010 Downloaded:17/9/2010 URLs submitted: 4,545 URLs in web index: 3,785 Should I stop the crawler as mentioned in 1? and what happened to the URLs which reported as not in web index? Thanks. Panyarak Ngamsritragul Khunying Long Athakravisunthorn Learning Resources Center Prince of Songkla University Hat Yai, Songkhla, Thailand -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech