Hello Alan, This thread is 3 years old now but our repositories are still experiencing the issues mentioned here. We are running DSpace 6.3 by the way. I've read from one of your CGSpace notes (https://alanorth.github.io/cgspace-notes/2018-11/) that when you encountered crawlers making a lot of requests from different IP addresses, you add them to your Tomcat's crawler session manager. I highly suspect that the cause of our repository 'hanging' is also because of this amount of requests from this crawlers (mostly from crawlers with user agent facebookexternalhit, Turnitin, and Unpaywall).
With regards to this, I hope you don't mind if you can share here the setting of your Tomcat's crawler session manager. Note that I have modified my postgresql.conf based from the discussion mentioned here: https://wiki.lyrasis.org/display/cmtygp/DCAT+Meeting+April+2017. Thanks and hoping for your positive response, euler On Friday, July 7, 2017 at 7:41:52 PM UTC+8, Alan Orth wrote: > > Hello, > > I've struggled with this in various forms over the seven years or so we've > been running DSpace. High load on public servers can easily exhaust > PostgreSQL connection slots. The easy answer is to increase the connection > limits, but before that it's better to understand why the system load is > increasing. Here are a few tips. > > The easiest thing is to enable DSpace's XML sitemaps. Search engines like > Google really hammer the repository as they crawl and click all sorts of > dynamic links in the Browse and Discovery sidebar. Instead, you register > your web property with Google Webmaster Tools and give them the path to > your sitemap so they can get to each item directly without crawling > haphazardly. Once you're sure Google is consuming your sitemap, you can > block them from the dynamic pages in robots.txt. Here's the link on the > wiki for DSpace 4: > > https://wiki.duraspace.org/display/DSDOC4x/Search+Engine+Optimization > > Second, look at your web server access logs. You might see many requests > from bots like Bing, Yandex, Google, Slurp, etc, and notice they will all > becoming from different IP addresses—sometimes from five or ten > concurrently! Another place you might see this is in the "Current Activity" > tab in the DSpace Admin UI control panel. The problem with this is that > each of these connections creates a new Tomcat session, which consumes > precious memory, CPU, and other resources. You can enable a Crawler Session > Manager Valve in your Tomcat config which will tell Tomcat to make all user > agents matching a certain pattern use a single session. There are some > notes from me in the comments here: > > https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017 > > And finally, in the last link is a discussion about updating the DSpace > defaults for PostgreSQL connections from a recently developers meeting. > > I hope that helps. Cheers, > > On Fri, Jul 7, 2017 at 12:57 AM christian criollo <[email protected] > <javascript:>> wrote: > >> Hello Alan >> >> yes the repository is public thanks for your answer >> >> >> El jueves, 6 de julio de 2017, 2:09:59 (UTC-5), Alan Orth escribió: >> >>> Hello, >>> >>> Is your repository public? It could be that you are getting lots of >>> traffic from search bots or people harvesting via REST / OAI... this would >>> definitely increase the load on the server and create more database >>> connections. >>> >>> Ciao, >>> >>> On Wed, May 24, 2017 at 11:23 PM christian criollo <[email protected]> >>> wrote: >>> >> >>>> Hi everybody >>>> >>>> the last month, our repository is presenting faults like that >>>> *org.apache.commons.dbcp.SQLNestedException: >>>> Cannot get a connection, pool error Timeout waiting for idle object, *I >>>> modified dspace.cfg the variables max connection to 100, but the system >>>> still the same, I watch that sessions in tomcat increase obstinately, i >>>> dont know whats wrong , please if somebody can tell me what can i do to >>>> fix >>>> this error, thanks for the help. >>>> >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "DSpace Technical Support" group. >>>> >>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>> >>> >>>> Visit this group at https://groups.google.com/group/dspace-tech. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >>> Alan Orth >>> [email protected] >>> >> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "DSpace Technical Support" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/dspace-tech. >> For more options, visit https://groups.google.com/d/optout. >> > -- > Alan Orth > [email protected] <javascript:> > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/4990e2ca-88d7-453e-8f6f-9f859494637do%40googlegroups.com.
