Since we're talking about our demo instances on the infrastructure of the ASF I suggest getting in touch with INFRA and work out a solution with them that favours both parties. They surely will have monitoring solutions in place and can advice on what is achievable.
Best regards, Pierre Smits Apache Trafodion <https://trafodion.apache.org>, Vice President Apache Directory <https://directory.apache.org>, PMC Member Apache Incubator <https://incubator.apache.org>, committer *Apache OFBiz <https://ofbiz.apache.org>, contributor (without privileges) since 2008* Apache Steve <https://steve.apache.org>, committer On Fri, Aug 24, 2018 at 4:36 PM Jacques Le Roux < jacques.le.r...@les7arts.com> wrote: > Agreed, I have used VisualVMin the past, it's a simple and efficient tool > > I have planned to make a VOTE about options if needed. Let's see if it > will be necessary (consensus being preferred) > > Jacques > > > Le 24/08/2018 à 16:06, Girish Vasmatkar a écrit : > > Speaking of monitoring tools and if we don't want to go for third party > > tools, we can also use VisualVM that comes bundled with Oracle JDK. It > can > > connect to the remote VM (OFBiz process) and start displaying various > > information. > > > > Very minimal configuration is needed in the form of VM argument to allow > > for remote monitoring. Also, to enable further analysis of what went > wrong, > > why JVM crashed etc, we should also dump heap as the JVM shuts down. > > > > Too many ways and too many options. Probably need to reach a unanimous > > decision, IMO. > > > > Thanks and Best regards, > > Girish Vasmatkar > > > > On Fri, Aug 24, 2018 at 4:56 PM Jacques Le Roux < > > jacques.le.r...@les7arts.com> wrote: > > > >> Thanks Michael, > >> > >> Best idea so far! > >> > >> Jacques > >> > >> > >> Le 24/08/2018 à 11:08, Michael Brohl a écrit : > >>> We are monitoring our OFBiz instances with JMX and self hosted Zabbix > >> [1]. > >>> Zabbix gives you a nice overview about the system health and metrics > >> like memory consumption etc. It also sends out warnings (Email, SMS or > >> else) > >>> if metrics are exceeded (like CPU load or memory consumption) as well > as > >> the system is not accessible. > >>> Looks like this: [2] > >>> > >>> There is no programming needed, just some configuration for JMX and > >> Zabbix. > >>> [1] https://www.zabbix.com/ > >>> [2] > >> https://www.ecomify.de/wp-content/uploads/2018/08/Zabbix_Monitoring.png > >>> If we want to see why the demos crash, it might be useful. If we only > >> want to monitor if the system is up, a simple cron job which sends a > mail > >>> might be enough... > >>> > >>> Regards, > >>> > >>> Michael Brohl > >>> ecomify GmbH > >>> www.ecomify.de > >>> > >>> > >>> Am 24.08.18 um 10:07 schrieb Taher Alkhateeb: > >>>> Okay all neat ideas, I'm not sure if the energy you will put into > >> something > >>>> like this is equal to the value produced but if you want to make this > >>>> happen I would be happy to assist. > >>>> > >>>> How much time will it take to make something like this happen? I ask > >>>> because it seems Jacques ia getting annoyed with these crashes and > we'd > >>>> like to help him out. > >>>> > >>>> On Fri, Aug 24, 2018, 10:59 AM Girish Vasmatkar < > >>>> girish.vasmat...@hotwaxsystems.com> wrote: > >>>> > >>>>> Hi Taher > >>>>> > >>>>> Please see my reply below in-line. > >>>>> > >>>>> On Fri, Aug 24, 2018 at 12:22 PM Taher Alkhateeb < > >>>>> slidingfilame...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Hi Girish, inline... > >>>>>> > >>>>>> On Thu, Aug 23, 2018, 7:25 PM Girish Vasmatkar < > >>>>>> girish.vasmat...@hotwaxsystems.com> wrote: > >>>>>> > >>>>>>> I had earlier replied to this thread but looks like the email did > not > >>>>> go > >>>>>>> through. I had leaned towards using the tool (only just) instead of > >> may > >>>>>> be > >>>>>>> having a CRON job or an alternative. > >>>>>>> > >>>>>>> What I feel now is that may be we can use JMX here and try to use > >>>>> various > >>>>>>> in build MBeans that provide CPU usage for the system and also for > >> the > >>>>>> JVM > >>>>>>> process we are concerned about that is OFBiz instance. We should > also > >>>>> be > >>>>>>> able to get the memory usage of the JVM and if reaches a particular > >>>>>>> threshold we can be notified. > >>>>>>> > >>>>>> Do you have a PoC for all of this? > >>>>>> > >>>>> GV : I can have one ready; and there is going to be much doing > >> involved. > >>>>>>> In addition, I think we already add a shutdown hook to the JVM > >>>>>> process... I > >>>>>>> am not sure and have not used it much but may be we can use it to > >> send > >>>>>> some > >>>>>>> notifications? Of course, it is applicable for graceful exits of > JVM > >>>>> only > >>>>>>> and if you just happen to kill the process it won't be of much > help. > >>>>>>> > >>>>>> The shutdown hook is used for shutting down. I'm not sure what is > the > >>>>>> purpose of mentioning it here? > >>>>>> > >>>>> GV : The reason I mentioned shutdown hook was it can be used to > >> send > >>>>> notification (may be email) or anything per our needs indicating that > >> the > >>>>> demo process was shut down. Per my understanding, shutdown hook > >> gets > >>>>> called whenever JVM shuts down gracefully. Graceful word is very > >> important > >>>>> here because we won't be able to do much if someone just kills the > >> process. > >>>>> The only thing a shutdown hook will add to this is that we will be > >> notified > >>>>> then and there. > >>>>> > >>>>>>> Hope it makes sense and correct me if I am wrong. > >>>>>> Well I'm struggling a bit. I didn't understand exactly what needs to > >> be > >>>>>> done? I see mixed topics about JMX, Mbeans, Memory monitors and > >> shutdown > >>>>>> hooks. First this seems to be more like coding than a tool, and > >> second I > >>>>>> have no idea how you want to implement this? > >>>>>> > >>>>> GV: Yes, it would mostly be coding rather than being a > substitute > >> for > >>>>> the tool. My idea was that to have a timer service run within the JVM > >> and > >>>>> it access various MBeans for the CPU usage and Memory usages just for > >> our > >>>>> monitoring purpose and raise an alert if it reaches a threshold. It > was > >>>>> just to have a glance over how JVM is performing. The disadvantage? > The > >>>>> service will run in OFBiz JVM and there will be considerable amount > of > >>>>> coding involved. > >>>>> > >>>>>> My idea for example is simple: create a cronjob that checks the > system > >>>>>> periodically and if the demo process stopped, restart it (or maybe > >>>>> rebuild > >>>>>> and restart). To go with your suggestion we need to perhaps first > >>>>>> understand it. > >>>>>> > >>>>> GV: There is nothing wrong with creating a CRON job, per se. The > >> only > >>>>> reason why I introduced MBeans in the mix was to be able to sort of > >> having > >>>>> OFBiz monitor itself within it's realm, hence use of MBeans. I > believe > >> a > >>>>> CRON will be able to do it as well. I probably did not get that we > >> probably > >>>>> want something that take some action after the JVM has crashed and > not > >>>>> having something that monitors the process and alerts concerned > parties > >>>>> that the process is occupying more than say 2 GB or it's CPU usage > has > >>>>> spiked above 80%. > >>>>> > >>>>> All in all, I feel we should choose the solution based on what we > want > >> to > >>>>> do and whether we want to take it further as well. I do not know what > >> the > >>>>> tool does now or whether it can build the system again and restart it > >>>>> automatically. I also do not know what measures we take in such an > >> event. I > >>>>> agree CRON will be simplest of them all, but if the tool provides all > >> of > >>>>> these (be able to take corrective measures) and not just send > >>>>> notifications, then it can also be worth it's salt. Yes, CRON will be > >> more > >>>>> technical way of achieving :) > >>>>> > >>>>> Thanks and Best regards, > >>>>> Girish Vasmatkar > >>>>> HotWax Systems > >>>>> > >>>>>>> Best regards, > >>>>>>> Girish Vasmatkar > >>>>>>> HotWax Systems > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Aug 23, 2018 at 8:48 PM Jacques Le Roux < > >>>>>>> jacques.le.r...@les7arts.com> wrote: > >>>>>>> > >>>>>>>> Le 23/08/2018 à 14:04, Taher Alkhateeb a écrit : > >>>>>>>>> I'm not sure why you're hanging this on me, > >>>>>>>> Because you answered to the bait ;) > >>>>>>>> > >>>>>>>>> but sure I'm willing to > >>>>>>>>> help. > >>>>>>>> Thanks, much appreciated! > >>>>>>>> > >>>>>>>>> Can I get some information on how the crashes are happening and > >>>>>>>>> how you're getting notified, and I will take it from there. > >>>>>>>> I think after a crash it's mostly to use dumps there (we have > >> several > >>>>>>> from > >>>>>>>> the recent pas) but I'm not sure they will help, and it takes time > >> to > >>>>>>>> analyse. > >>>>>>>> > >>>>>>>> In the past I took the time to analyse some of them and it was > >>>>>>>> interesting. For instance in 2010 I found a bug in a Java version > we > >>>>>> were > >>>>>>>> using and it > >>>>>>>> helped me in a custom project I was also doing then: > >>>>>>>> https://markmail.org/message/byu2ivjn7wckayzz > >>>>>>>> > >>>>>>>> Lastly it was mostly lack of memory, despite having 8GB now. I > >>>>> created > >>>>>>>> https://issues.apache.org/jira/browse/INFRA-16780 for that, but > not > >>>>>> sure > >>>>>>>> it was > >>>>>>>> the reason. At least we have less issues since. > >>>>>>>> > >>>>>>>> Before (months ago) the Infra was monitoring our demos and > alerting > >>>>> us > >>>>>> by > >>>>>>>> mail (you just had to subscribe). Unfortunately we are on our own > >> for > >>>>>>> that > >>>>>>>> now, too much projects in the ASF... > >>>>>>>> As as I said initially in this thread I'm currently using > >>>>>> montastic.com > >>>>>>>> for the email alerts. > >>>>>>>> My idea when I started this thread was that it all depends on me, > >> and > >>>>>>>> that's bad. So I wanted people to be aware, you are much welcome. > >>>>>>>> > >>>>>>>> Jacques > >>>>>>>>> On Thu, Aug 23, 2018 at 2:29 PM Jacques Le Roux > >>>>>>>>> <jacques.le.r...@les7arts.com> wrote: > >>>>>>>>>> Yes we can, will you? > >>>>>>>>>> > >>>>>>>>>> Jacques > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Le 22/08/2018 à 19:29, Taher Alkhateeb a écrit : > >>>>>>>>>>> Well, we can ask Infra for help, we can check available > >>>>> solutions, > >>>>>> we > >>>>>>>>>>> can create a CRON script that checks things periodically, there > >>>>> are > >>>>>>>>>>> multiple ways to go about this. > >>>>>>>>>>> > >>>>>>>>>>> My personal preference is for a simple CRON script that takes > >>>>> care > >>>>>> of > >>>>>>>> this. > >>>>>>>>>>> On Wed, Aug 22, 2018 at 8:25 PM Jacques Le Roux > >>>>>>>>>>> <jacques.le.r...@les7arts.com> wrote: > >>>>>>>>>>>> So you prefer that I'm the only one to take care of the demos > >>>>> and > >>>>>>> act > >>>>>>>> on alerts? > >>>>>>>>>>>> Jacques > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Le 22/08/2018 à 18:53, Taher Alkhateeb a écrit : > >>>>>>>>>>>>> I prefer not to include any tools without proper analysis and > >>>>>>>>>>>>> discussion first. Less is more. > >>>>>>>>>>>>> On Wed, Aug 22, 2018 at 5:31 PM Jacques Le Roux > >>>>>>>>>>>>> <jacques.le.r...@les7arts.com> wrote: > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Should I consider no answers as a lazy consensus and should > I > >>>>>> send > >>>>>>>> (rare) alerts to this ML? > >>>>>>>>>>>>>> Without any answers I'll consider it a lazy consensus in 2 > >>>>> days. > >>>>>>>>>>>>>> Jacques > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Le 17/08/2018 à 12:22, Jacques Le Roux a écrit : > >>>>>>>>>>>>>>> Le 13/08/2018 à 18:21, Jacques Le Roux a écrit : > >>>>>>>>>>>>>>>> Le 12/08/2018 à 11:26, Jacques Le Roux a écrit : > >>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> This morning I noticed the old demo was down and > restarted > >>>>> it > >>>>>>>> after cleaning things. > >>>>>>>>>>>>>>>>> Previously (still some weeks ago) Daniel Gruno's (from > >>>>> Infra > >>>>>>>> team) company was kindly providing us a mean to monitor our demos > >> but > >>>>>> it > >>>>>>>> seems that > >>>>>>>>>>>>>>>>> this mean is no longer available > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I have asked about it and will let you know about it... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Have a good weekend > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Jadques > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Daniel confirmed it's terminated. I turned to UpTimeRobot > >>>>>> which > >>>>>>>> is free and seems as well good :) > >>>>>>>>>>>>>>>> Jacques > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This thread started on user ML but I don't want to bother > >>>>>>> everyone > >>>>>>>> with technical details. > >>>>>>>>>>>>>>> I used my own @a.o email to create the monitoring. > >>>>> UpTimeRobot > >>>>>> is > >>>>>>>> certainly the best free monitoring tool, with some possibilities > >>>>> others > >>>>>>>> don't give. > >>>>>>>>>>>>>>> But the free version has an inconvenient. You can only > check > >>>>>>> every > >>>>>>>> 5 mins and when the instances restart it takes more than 5 mins > >> each. > >>>>>>>>>>>>>>> So everyday I get a down an up alerts for each. I have > >>>>> switched > >>>>>>> to > >>>>>>>> montastic.com. > >>>>>>>>>>>>>>> I was wondering if we don't want to share that here. > >>>>>>>>>>>>>>> We could then have these alerts here and any committer, > using > >>>>>> the > >>>>>>>> info inhttps://svn.apache.org/repos/asf/ofbiz/tools/demo-backup > >>>>> could > >>>>>>>> handle issues. > >>>>>>>>>>>>>>> It seems better, isn'it? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Jacques > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>> > >> > >