Hello everyone,

As you perhaps know, I've been experiencing some problems with googlebot and invenio's sessions. In fact, the tables user and session are growing way to fast. As an example here are some access stats for the last 24 hours (infoscience prod server):
new sessions: 616'727 (session.MYD => 205MB, session.MYI => 30MB)

Googlebot, MSNbot and yahoo slurp accept cookies but do not use them. Therefore, each access creates a new session. Moreover, it's difficult to clean these sessions as inveniogc takes a really long time to run (I never reached an end, but I'm a bit impatient). I tried to run a simple 'select * from session' yesterday (one week without any garbage collecting) and was forced to restart mysqld, as nobody could access infoscience anymore (ok, this was a really bad idea, but I wanted to see if any improvement to inveniogc could make it work better).

Discussing with Sam about this issue, we went to the conclusion that the robots should not receive any session. As Invenio is really close to a new release, I did not modify anything in CVS, and created a patch. The modification is really simple, but as it takes place inside getUid function (probably one of the most used in invenio), it could perhaps slow thing down.

Best regards,
Greg

Attachment: webuser.py.patch
Description: Binary data

Attachment: websession_config.py.patch
Description: Binary data


____________________________________________________________________

Gregory Favre
Coordinateur Infoscience
École Polytechnique Fédérale de Lausanne
KIS - DIT
Case Postale 121
CH-1015 Lausanne
+41 21 693 22 88
+ 41 79 526 52 13
gregory.fa...@epfl.ch
http://plan.epfl.ch/?sciper=128933
____________________________________________________________________



Reply via email to