Hi Panyarak,

It might be an idea to add /displaystats to your JSPUI's robots.txt and to
any Google Webmaster Tools robots.txt files or Page Removal Requests.
For Google to de-index pages, it generally likes to see a 404 (not found) or
a 410 (gone).

Unfortunately, the servlet that handles statistics display for JSPUI throws
a NullPointerException when a handle is passed to it that doesn't turn into
a valid DSpace object. It *should* throw a friendly 404 to help crawlers
like Google realise the page is gone.

I've opened a JIRA issue for the NPE bug -
http://jira.dspace.org/jira/browse/DS-689 - and attached a patch for 1.6.2
(and trunk, and probably other 1.6.x versions) that will make sure that when
anyone (including Google) visits those pages, it sees a 404 instead of
"Internal Server Error".

Hopefully this, along with /displaystats (and/or /displaystats* ?) in your
robots.txts or removal requests will help convince Google to stop crawling.

Cheers,

Kim

On 4 October 2010 13:52, Panyarak Ngamsritragul <pa...@me.psu.ac.th> wrote:

>
> Dear all,
>
> A couple of weeks ago I have posted questions about Google crawler and
> sitemaps.  There was a response from Vinit, but I still could not reach
> the solution to what I am experiencing.
>
> I am running 1.6.2 and have registered the site (kb.psu.ac.th) to Google's
> webmaster tools.  I understand that I have submitted the sitemaps.  After
> sometimes, I have repeatedly receiving Internal server error as a result
> of Google crawler trying to access some non-existence records.  Some of
> the records were repeatedly accessed by crawler for more than a month now.
>
> Can anyone help me to pin point the root cause of the problem?
> I have attaced here with one of the error messages.
>
> Thanks.
>
> Panyarak Ngamsritragul
> Prince of Songkla University.
> Thailand.
>
> ---------- Forwarded message ----------
> Date: Sun, 3 Oct 2010 18:50:06 +0700 (ICT)
> From: psukb-nore...@psu.ac.th
> To: psukb-h...@me.psu.ac.th
> Subject: PSUKB: Internal Server Error
>
> An internal server error occurred on http://kb.psu.ac.th/psukb:
>
> Date:       10/3/10 6:50 PM
> Session ID: D5E58233D9F2093B248C4CC5C65D96D1
> User:       Anonymous
> IP address: 66.249.69.1
>
> -- URL Was: http://kb.psu.ac.th:8080/psukb/displaystats?handle=2553/929
> -- Method: GET
> -- Parameters were:
> -- handle: "2553/929"
>
> Exception:
> java.lang.NullPointerException
>        at
> org.dspace.app.webui.servlet.DisplayStatisticsServlet.displayStatistics(DisplayStatisticsServlet.java:182)
>        at
> org.dspace.app.webui.servlet.DisplayStatisticsServlet.doDSGet(DisplayStatisticsServlet.java:123)
>        at
> org.dspace.app.webui.servlet.DSpaceServlet.processRequest(DSpaceServlet.java:151)
>        at
> org.dspace.app.webui.servlet.DSpaceServlet.doGet(DSpaceServlet.java:99)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>        at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
>        at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
> org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:112)
>        at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
>        at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>        at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>        at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>        at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>        at java.lang.Thread.run(Thread.java:619)
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
>
> ------------------------------------------------------------------------------
> Virtualization is moving to the mainstream and overtaking non-virtualized
> environment for deploying applications. Does it make network security
> easier or more difficult to achieve? Read this whitepaper to separate the
> two and get a better understanding.
> http://p.sf.net/sfu/hp-phase2-d2d
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to