Hi, It appears that the default stats package (General Overview report) displays high numbers for item views and bitstream views (item downloads) due to many web crawlers accessing our repositories to download the full text for indexing. The majority of traffic is coming from machines not end users giving an obscure impression of dspace stats, e.g. 15,000 downloads, 30,000 item views in our case for January 2007.
For example: our apache log shows a web crawler accessing the full text of a PDF:- 74.6.86.213 - - [09/Feb/2007:12:04:45 +0000] "GET /dspace/bitstream/1983/898/1/webb_IEEE_vtc_spring2006.pdf HTTP/1.0" 200 178656 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" the corresponding entry in our dspace log:- 2007-02-09 12:04:45,120 INFO org.dspace.app.webui.servlet.BitstreamServlet @ anonymous:session_id=FED19B6B8FDDB615271F818BB7B766C4:ip_addr=137.222.120.28:view_bitstream:bitstream_id=1706 It would be nice to filter activity coming from web crawlers during log file analysis. Is it worth adding this as a feature request? Any ideas on how this can be achieved? Thanks, Naveed -------------------------------------------------------- Naveed Hashmi Information Systems and Computing University of Bristol ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech