Hello Giulio, Thanks for the the detailed and clear message. My understanding with your proposal is as below:
1) We should enable configuration settings to ignore visit entries for Internal IPs and known requests (like HaProxy/load balancer, monitoring requests) etc. 2) For large DB size due to visits and hits, we can use a separate Stats database for visit entity group. 3) Also, idea to purge old visits using a scheduled job is good, We can set number of days configurable as per need. -- Best Regards, Arun Patidar www.hotwax.co On Fri, Apr 12, 2019 at 5:17 AM Giulio Speri - MpStyle Srl < giulio.sp...@mpstyle.it> wrote: > Hello devs, > > I'm writing because I would like to explain a problem my company, MpStyle, > faced with an OFBiz installation with two active ecommerce sites, for one > of our customers. > I am writing this email to the dev mailing list, because I could not find > any reference in the mailings to the kind of problem we faced, and I think > that the solution we built, could be an improvement to the OFBiz > visit/visitor tracking capabilites. > > I shortly explain the server architecture on which OFBiz is running: hosted > by a third party supplier, there are two (virtual) machines where Apache > OFBiz 13.07.03 is running behind Apache2 web server (so we have two web > fronts). > On other two different machines there are the database (MariaDB) and > HaProxy has a load balancer. > HaProxy is configured to perform its Health Checks on the backend servers > with a Http GET on the Home Page of one of the two sites. > Visit and Visitor tracking are enabled, for BI and analytics purposes, so > we cannot turn them off. > These two combined things caused the Visit and Visitor tables to explode in > dimensions (we counted about 19M records of Visit and about 67M of > Visitors, with the 86% of those caused by the load balancer), since each > hit of the HaProxy store a Visit and a Visitor record on the db (plus some > other record of other entities, like ShoppingList, due to <firstvisit> and > <preprocessor> events). > A bad side effect of this situation, on the long run, is an overall > performance degradation, and an increase in webfront unavailability time > windows during the day: it's not necessary to say that our customer was not > so happy about this. > > The difficult part of figuring out this problem, was that we did not have > direct access to HaProxy and DB machines, to check logs. > > The solution we thought and implemented, was to exclude from Visit/Visitor > tracking specific IP addresses (for our case we were interested in HaProxy > IP). > The Visit and Visitor records (along with firstvisit and preprocessor > events) are created mainly in the ControlServlet class, using VisitHandler > getVisit/getVisitor/getVisitId methods. > > Our idea consist in reading from a .properties file one or more IP > addresses we would like to exclude from tracking and then check them > against the client ip address the request is coming from. > If the client ip address is in the "exclusion list", then do not persist > visit/visitor and do not run firstvisit events neither for it. > > The idea is quite simple, but we noticed in few days, a meaningful > improvement in overall system performance and stability/availability. > > This kind of exclusion could be also useful in case we do not want to track > or register internal IP addresses (ie: mainly used for testing). > > However this solution, should be integrated with a service (cron or > scheduled in ofbiz) that keeps the number of records in the tables limited > (for example keep only the last month of visit/visitor); I think that these > two solutions together, could do the job well. > > I hope my explanation was clear enough and I would be happy to know what do > you think about this. > > Thank you all for the attention! > > Regards, > > Giulio > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Giulio Speri > > > *Mp Styl**e Srl* > via Antonio Meucci, 37 > 41019 Limidi di Soliera (MO) > T 059/684916 > M 334/3779851 > > www.mpstyle.it >