Hello devs, I'm writing because I would like to explain a problem my company, MpStyle, faced with an OFBiz installation with two active ecommerce sites, for one of our customers. I am writing this email to the dev mailing list, because I could not find any reference in the mailings to the kind of problem we faced, and I think that the solution we built, could be an improvement to the OFBiz visit/visitor tracking capabilites.
I shortly explain the server architecture on which OFBiz is running: hosted by a third party supplier, there are two (virtual) machines where Apache OFBiz 13.07.03 is running behind Apache2 web server (so we have two web fronts). On other two different machines there are the database (MariaDB) and HaProxy has a load balancer. HaProxy is configured to perform its Health Checks on the backend servers with a Http GET on the Home Page of one of the two sites. Visit and Visitor tracking are enabled, for BI and analytics purposes, so we cannot turn them off. These two combined things caused the Visit and Visitor tables to explode in dimensions (we counted about 19M records of Visit and about 67M of Visitors, with the 86% of those caused by the load balancer), since each hit of the HaProxy store a Visit and a Visitor record on the db (plus some other record of other entities, like ShoppingList, due to <firstvisit> and <preprocessor> events). A bad side effect of this situation, on the long run, is an overall performance degradation, and an increase in webfront unavailability time windows during the day: it's not necessary to say that our customer was not so happy about this. The difficult part of figuring out this problem, was that we did not have direct access to HaProxy and DB machines, to check logs. The solution we thought and implemented, was to exclude from Visit/Visitor tracking specific IP addresses (for our case we were interested in HaProxy IP). The Visit and Visitor records (along with firstvisit and preprocessor events) are created mainly in the ControlServlet class, using VisitHandler getVisit/getVisitor/getVisitId methods. Our idea consist in reading from a .properties file one or more IP addresses we would like to exclude from tracking and then check them against the client ip address the request is coming from. If the client ip address is in the "exclusion list", then do not persist visit/visitor and do not run firstvisit events neither for it. The idea is quite simple, but we noticed in few days, a meaningful improvement in overall system performance and stability/availability. This kind of exclusion could be also useful in case we do not want to track or register internal IP addresses (ie: mainly used for testing). However this solution, should be integrated with a service (cron or scheduled in ofbiz) that keeps the number of records in the tables limited (for example keep only the last month of visit/visitor); I think that these two solutions together, could do the job well. I hope my explanation was clear enough and I would be happy to know what do you think about this. Thank you all for the attention! Regards, Giulio -- Giulio Speri *Mp Styl**e Srl* via Antonio Meucci, 37 41019 Limidi di Soliera (MO) T 059/684916 M 334/3779851 www.mpstyle.it