Hello Tibor, > On Thu, 03 Mar 2011, Ferran Jorba wrote: >> I've looked at them and I have noticed, embarrassingly, that we have a >> lot of room for improvement, as we have a much lower values (probably >> Debian defaults; I'm not an expert at all) for our 8 GB server. > > MySQL comes with example configurations; on Debian, see under > /usr/share/doc/mysql-server-5.1/examples. You can typically take > my-huge.cnf with good results.
I have looked at Debian examples from time to time. On our older (2 GB, 32bit) server we had the this my-huge example. But header of this my-huge.cnf reads: # Example MySQL config file for very large systems. # # This is for a large system with memory of 1G-2G where the system runs mainly # MySQL. So it seems we have a 4 * very-large-system now ;-). Seriously, I'm afraid we overlooked at those values when we upgraded to our new system. Everything was so fast that we didn't care. But now, with full text index activated, updating records with documents that sum up several thousand pages of PDF, it may take hours for bibindex to complete its tasks. > (Especially since at DDD you don't have 2M of records with 20M > citation pairs, so my-huge.cnf should be already fine for you.) Sure? 1-2 GB is not 8 GB. That's why Cornelia's sample, with their 16 GB machine was more attractive to me, specially because apparently they did some research before tuning it up. > BTW, I have put up some notes about MySQL tuning on our old wiki: > > <https://twiki.cern.ch/twiki/bin/view/CDS/InvenioTuning#1_Tuning_MySQL> I forgot about this page, thanks for pointing it out. > The max_connections part on the wiki and musings about the number of > Apache processes are applicable when you run mod_python of mod_wsgi > embedded mode. With mod_wsgi daemon mode, only the backend daemons are > connecting to MySQL, so things are much better from this point of view. Great. > I'll refresh the page a bit when moving it to Trac, in a few weeks. > >> I have been waiting for a follow up from her question about >> max_allowed_packet, but as I haven't received any, I'd like to ask >> now. > > (Yeah, I hope to get to replying to Cornelia one of these days.) > > Bigger max_allowed_packet is needed for serialised Python objects, > especially big citation dictionaries that are living as single blobs. > If you don't use citation ranking on your Invenio instance, then the > default max_allowed_packet value should be just fine. Otherwise you can > raise it to something like 100M or 1G, depending on the number of > citer-citee pairs. As I understood well Samuele last summer, when he was explaining me how full text works, with the list of records being compressed and uncompressed (serialised and serialised) continuously, a word that is found in several thousand documents has all their record ids in a blob, right? For example: http://ddd.uab.cat/search?sc=1&f=fulltext&p=of (21,758 records) http://ddd.uab.cat/search?sc=1&f=fulltext&p=barcelona (25,633 records) Would that would recommend bigger max_allowed_packet value? We don't have many bibliographic records because we do have many documents under a single record, but we do hold more than 80,000 PDFs that sum up more than 2 milion pages, so indexes may be quite large anyway. Thanks, Ferran
