To continue the discussion on a slightly related note: I've just finished
dealing with the fallout caused by some new bot — the only fingerprint of
which is its unique-but-normal-looking user agent — hitting our XMLUI with
450,000 requests from six different IPs over just a few hours. This
generated a ridiculous amount of load on the server, including 160
PostgreSQL connections and 52,000 Tomcat sessions before I was able to
mitigate it. Surprisingly, since I had increased out pool size to 300 after
my last message, we never got pool timeout or database connection errors in
dspace.log, but the site was very unresponsive — and this is on a beefy
server with SSDs, plenty of RAM, large PostgreSQL buffer cache, etc! I
ended up having to rate limit this user agent in our frontend nginx web
server using the limit_req_zone module[0].

So a bit of a mixed success and frustration here. No amount of pool
tweaking will fix this type of issue, because there's always another
bigger, stupider bot that comes along eventually and doesn't match the
"bot" user agent. I will definitely look into implementing separate pools
as Tom had suggested, though, to limit the damage caused by high load to
certain DSpace web applications. Keep sharing your experiences! This is
very valuable and interesting to me.

[0]
https://github.com/ilri/rmg-ansible-public/commit/368faaa99028c8e0c8a99de3f6c253a228d5f63b

Cheers!

On Thu, Jan 4, 2018 at 7:31 AM Alan Orth <alan.o...@gmail.com> wrote:

> That's a cool idea to use a separate pool for each web application, Tom!
> I'd much rather have my OAI fail to establish a database connection than my
> XMLUI. ;)
>
> Since I wrote the original mailing list message two weeks ago I've had
> DSpace fail to establish a database connection a few thousand times and
> I've increased my pool's max active from 50 to 75 and then 125 — our site
> gets about four million hits per month (from looking at nginx logs), so I'm
> still trying to find the "sweet spot" for the pool settings. Anything's
> better than setting the pool in dspace.cfg, though.
>
> I wish other people would share their pool settings and experiences.
>
> On Wed, Jan 3, 2018 at 2:40 PM Hardy Pottinger <hardy.pottin...@gmail.com>
> wrote:
>
>> Hi, please do create this wiki page, I'd love to read it. Thanks!
>>
>> --Hardy
>>
>> On Wed, Jan 3, 2018 at 4:10 PM, Tom Desair <tom.des...@atmire.com> wrote:
>>
>>> I just wanted to add a small note that having 1 single DB pool for all
>>> Tomcat webapps can (and has) lead to problems. Your current pool size is
>>> 50. This means that if you have (malicious) crawlers hitting your OAI
>>> endpoint, this can deplete the available database connections available for
>>> the web UI (XMLUI or JSPUI). The other way around can also happen.
>>>
>>> But using JNDI DB pools also give you more fine-grained control over the
>>> connection distribution over the different web apps. For example, a default
>>> PostgreSQL installation comes with a max connection limit of 100. This
>>> means you can safely use around 70 connections (from experience). You can
>>> then divided these connections with individual JNDI pools like this:
>>>
>>>    - OAI: 15 connections
>>>    - REST: 15 connections
>>>    - WEB UI: 40 connections
>>>
>>>
>>> Let me know if you've created a JNDI DB pool wiki page. I'll then try to
>>> add some useful information on JDBC interceptors (
>>> https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html#Configuring_JDBC_interceptors
>>> ).
>>>
>>>
>>> [image: logo] Tom Desair
>>> 250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
>>> <https://maps.google.com/?q=3A,+Lucius+Gordon+Drive,+West+Henrietta,+NY+14586&entry=gmail&source=g>
>>> Gaston Geenslaan 14, Leuven 3001, Belgium
>>> <https://maps.google.com/?q=Gaston+Geenslaan+14,+Leuven+3001,+Belgium&entry=gmail&source=g>
>>> www.atmire.com
>>> <http://atmire.com/website/?q=services&utm_source=emailfooter&utm_medium=email&utm_campaign=tomdesair>
>>>
>>> 2018-01-03 22:36 GMT+01:00 Tim Donohue <tdono...@duraspace.org>:
>>>
>>>> Hi Alan & Mark,
>>>>
>>>> These notes look like the start to some enhanced documentation around
>>>> setting up DSpace + Tomcat JNDI (hint, hint).
>>>>
>>>> I'm wondering (out loud) if we should take these concepts/ideas and
>>>> turn them into official documentation in the "Installing DSpace" section
>>>> (maybe somewhere under "Advanced Installation"?):
>>>> https://wiki.duraspace.org/display/DSDOC6x/Installing+DSpace
>>>>
>>>> Thanks though for sharing the notes and Q&A here. I think this will be
>>>> very helpful for others who wish to go this route.
>>>>
>>>> - Tim
>>>>
>>>>
>>>> On Wed, Jan 3, 2018 at 3:17 PM Mark H. Wood <mwoodiu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for posting these notes.  I'm sure they will be helpful.
>>>>> You've shown some tools that I didn't know about.
>>>>>
>>>>> A pool instantiated by DSpace is probably effectively invisible to
>>>>> other webapp.s even in the same JVM.  The Servlet spec. tries very hard to
>>>>> create the illusion that each webapp. is floating in a kind of womb, where
>>>>> everything it needs is mysteriously provided for it from somewhere beyond
>>>>> its perception.  Each has its own classloader, for example, and so things
>>>>> that a webapp. creates for itself tend to be known only in places that are
>>>>> not accessible by other webapp.s.  I could wish that DSpace made more
>>>>> thorough use of the Servlet environment rather than behaving as if it is
>>>>> standalone code.
>>>>>
>>>>> You're quite correct that the command-line tools don't share a pool
>>>>> with any of the webapp.s, because the launcher runs in a different process
>>>>> with its own address space.  This is one reason to continue specifying 
>>>>> pool
>>>>> settings in dspace.cfg -- IMO this should be the *only* use of those
>>>>> settings.  It *is* possible to supply a pool to the command line out of
>>>>> JNDI -- I've done it -- but you need to supply a directory service to the
>>>>> process.  I can say a little about that if anybody is interested.  You
>>>>> could provide in dspace.cfg settings more appropriate to the command line,
>>>>> if your webapp.s are set up with pools (tuned for their needs) from JNDI.
>>>>>
>>>>> The reason you don't have to tinker with directory services for
>>>>> webapp.s is that the <Resource> and <ResourceLink> elements are causing
>>>>> your Servlet container (Tomcat) to populate an internal directory service
>>>>> with objects such as your connection pool.  This is specified by Java EE,
>>>>> but many Servlet containers implement it even when not required by the
>>>>> relevant spec.s.
>>>>>
>>>>> You *do* need to supply any DBMS drivers to the container itself,
>>>>> because the pool and connections are created by the container and so must
>>>>> be visible from *its* classloader(s), which (in Tomcat anyway) are on a
>>>>> branch of the hierarchy that is parallel to those of the webapp.s.  I also
>>>>> would use the latest released driver.
>>>>>
>>>>> It should be simple to provide a log message when the resolution of
>>>>> jdbc/dspace *succeeds*, and I think we should.  There's already a Jira
>>>>> issue about making the other case less scary, and perhaps this should be
>>>>> included in that work.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "DSpace Technical Support" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to dspace-tech+unsubscr...@googlegroups.com.
>>>>> To post to this group, send email to dspace-tech@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>> --
>>>> Tim Donohue
>>>> Technical Lead for DSpace & DSpaceDirect
>>>> DuraSpace.org | DSpace.org | DSpaceDirect.org
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "DSpace Technical Support" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to dspace-tech+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to dspace-tech@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "DSpace Technical Support" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to dspace-tech+unsubscr...@googlegroups.com.
>>> To post to this group, send email to dspace-tech@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "DSpace Technical Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to dspace-tech+unsubscr...@googlegroups.com.
>> To post to this group, send email to dspace-tech@googlegroups.com.
>> Visit this group at https://groups.google.com/group/dspace-tech.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> Alan Orth
> alan.o...@gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
>


-- 

Alan Orth
alan.o...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to