Hi,

There are 2 points here:
1. In our repository, we have configured to allow crawler to browser our 
site by putting a robot.txt with only one line :
User-agent: *
I have checked with webmaster tools and it reports that the crawler access 
was success.  Anyway, I am not quite sure that should be OK.  The problem 
is that internal error messages are being sent to me everyday saying that 
the crawler cannot access certain pages.  I have checked the handles 
attached and found that those are non-existent pages...  Can any of you 
please suggest what I should do to get rid of this kind of errors ?

2. I also submited sitemaps to Google, the latest result reported in 
webmaster tools is:
   Sitemap: http://kb.psu.ac.th/psukb/sitemap
   Status: OK
   Type: Index
   Submitted: 17/7/2010
   Downloaded:17/9/2010
   URLs submitted: 4,545
   URLs in web index: 3,785

Should I stop the crawler as mentioned in 1? and what happened to the
URLs which reported as not in web index?

Thanks.

Panyarak Ngamsritragul
Khunying Long Athakravisunthorn Learning Resources Center
Prince of Songkla University
Hat Yai, Songkhla, Thailand

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to