Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
2010/12/1 John Bayly freebsd.po...@tipstrade.net On 01/12/2010 14:17, Arjen de Korte wrote: Citeren Charles Lepple clep...@gmail.com: The get_belkin_reply() function looks fragile to me. Three seconds should be enough to fill the buffer, but if you put a few upsdebugx() calls around ser_get_buf_len(), it should be evident whether the read is timing out, or if there is a problem with the format of the response. Starting with ser_flush_io(upsfd); Thanks for the suggestions, I've added the flush statement as well as some debugging information. As this is a intermittent issue I decided to try overloading the UPS by sending it repeated beeper commands while watching the debug output. What appears to happen is that the UPS returns an unknown ~00R000 response. This means get_belkin_reply() returns -1, causing a datastale state is set when called from do_status(). you should remove the datastale() call since upsd will automatically flag the device as stalled if it has failed to update its data for 15 seconds (default of MAXAGE). cheers, Arnaud -- Linux / Unix Expert RD - Eaton - http://powerquality.eaton.com Network UPS Tools (NUT) Project Leader - http://www.networkupstools.org/ Debian Developer - http://www.debian.org Free Software Developer - http://arnaud.quette.free.fr/ ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser
Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
Citeren Arnaud Quette aquette@gmail.com: Thanks for the suggestions, I've added the flush statement as well as some debugging information. As this is a intermittent issue I decided to try overloading the UPS by sending it repeated beeper commands while watching the debug output. What appears to happen is that the UPS returns an unknown ~00R000 response. This means get_belkin_reply() returns -1, causing a datastale state is set when called from do_status(). you should remove the datastale() call since upsd will automatically flag the device as stalled if it has failed to update its data for 15 seconds (default of MAXAGE). Not at all! The upsd server will only declare the *driver* stale if it fails to respond within MAXAGE seconds. However, as long as it keeps answering the PING from the server, it will not be declared stale. This mechanism is something completely different from what happens if the driver calls dstate_datastale(). In that case the driver tells the upsd server that the *UPS* fails to respond. See the chapter on Staleness control in docs/new-drivers.txt. What really needs to be done, is that the driver doesn't treat the ~00R000 reply as an error condition. Apparently the UPS acknowledges the receipt of data, without further response (indicating that 0 bytes follow). The belkin driver doesn't accept this at the moment and requires that a reply follows. This is what needs to be changed. Last but not least, in most drivers, we allow a couple of missed replies before we call dstate_datastale() so that glitches don't lead to automatic reconnects. Best regards, Arjen -- Please keep list traffic on the list (off-list replies will be rejected) ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser
Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
Citeren John Bayly freebsd.po...@tipstrade.net: Last but not least, in most drivers, we allow a couple of missed replies before we call dstate_datastale() so that glitches don't lead to automatic reconnects. Can you suggest what driver would be a good template to use? Take a look at the upsdrv_updateinfo() function in the 'blazer.c' driver core. Best regards, Arjen -- Please keep list traffic on the list (off-list replies will be rejected) ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser
Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
On 02/12/2010 10:54, Arjen de Korte wrote: Citeren John Bayly freebsd.po...@tipstrade.net: Last but not least, in most drivers, we allow a couple of missed replies before we call dstate_datastale() so that glitches don't lead to automatic reconnects. Can you suggest what driver would be a good template to use? Take a look at the upsdrv_updateinfo() function in the 'blazer.c' driver core. Best regards, Arjen So, I've made the following changes: In get_belkin_reply(), allow responses with no trailing data -if ((cnt 1) || (cnt 255)) +if (cnt == 0) {/* possible to have ~00R000, return empty response */ +buf[0] = 0; +return 0; + +} else if ((cnt 1) || (cnt 255)) return -1; Added method to return UPS status, NULL is returned if no status is available static char *get_status() { chartemp[SMALLBUF], st[SMALLBUF]; intres; const char*status = NULL; send_belkin_command(STATUS,STAT_STATUS,); res = get_belkin_reply(temp); if (res 1) return NULL; get_belkin_field(temp, st, sizeof(st), 6); if (*st == '1') { status = OFF; } else if (*st == '0') {/* (OFF) and (OB | OL) are mutually exclusive */ get_belkin_field(temp, st, sizeof(st), 2); if (*st == '1') { status = OB;/* on battery */ send_belkin_command(STATUS,STAT_BATTERY,); res = get_belkin_reply(temp); if (res 1) {/* no battery info, so no reliable status */ status = NULL; } else { get_belkin_field(temp, st, sizeof(st), 10); res = atoi(st); get_belkin_field(temp, st, sizeof(st), 2); if (*st == '1' || res LOW_BAT) status = LB;/* low battery */ } } else if (*st == '0') { status = OL;/* on line */ } } return status; } Modified do_status(), calls get_status() and allows for MAXTRIES (3) static int do_status(void) { /* fetch the UPS status, or null if unavailable */ const char*status = get_status(); if (status) { if (retry)/* previous attempt had failed */ upslogx(LOG_WARNING, Communications with UPS re-established); status_init(); status_set(status); status_commit(); dstate_dataok(); retry = 0; return 1; } else { if (retry MAXTRIES) { upslogx(LOG_WARNING, Communications with UPS lost: status read failed!); retry++; } else {/* too many retries */ dstate_datastale(); } return 0; } } C isn't my native language so I'd appreciate any feedback either negative, but preferably positive :-) ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser
Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
On Thursday, December 02, 2010 10:05:54 am Arjen de Korte did opine: Citeren Arnaud Quette aquette@gmail.com: Thanks for the suggestions, I've added the flush statement as well as some debugging information. As this is a intermittent issue I decided to try overloading the UPS by sending it repeated beeper commands while watching the debug output. What appears to happen is that the UPS returns an unknown ~00R000 response. This means get_belkin_reply() returns -1, causing a datastale state is set when called from do_status(). you should remove the datastale() call since upsd will automatically flag the device as stalled if it has failed to update its data for 15 seconds (default of MAXAGE). Not at all! The upsd server will only declare the *driver* stale if it fails to respond within MAXAGE seconds. However, as long as it keeps answering the PING from the server, it will not be declared stale. This mechanism is something completely different from what happens if the driver calls dstate_datastale(). In that case the driver tells the upsd server that the *UPS* fails to respond. See the chapter on Staleness control in docs/new-drivers.txt. What really needs to be done, is that the driver doesn't treat the ~00R000 reply as an error condition. Apparently the UPS acknowledges the receipt of data, without further response (indicating that 0 bytes follow). The belkin driver doesn't accept this at the moment and requires that a reply follows. This is what needs to be changed. Last but not least, in most drivers, we allow a couple of missed replies before we call dstate_datastale() so that glitches don't lead to automatic reconnects. Best regards, Arjen I've been sitting here following this thread and wondering if the OP has told us everything? He may indeed be using serial at the ups, but if he has a pl2303 ser-usb adapter in the signal path and is using a ttyUSB# connection, then there could be a possibility that the pl2303 adapter is eating his lunch, specifically the first byte of a packet at frequent intervals, and this will confuse virtually all upsd implementations regardless of whose upsd it is, including belkin's own, now Jurassic dated bulldog software. Most of the more modern belkin UPS's do conform to the usb-hid specs, and I have had zero problems with loss of comm with mine over a pure usb circuit. usb 2-9: new low speed USB device using ohci_hcd and address 5 usb 2-9: New USB device found, idVendor=050d, idProduct=0751 usb 2-9: New USB device strings: Mfr=4, Product=20, SerialNumber=0 usb 2-9: Product: Belkin UPS usb 2-9: Manufacturer: Belkin It is a 1500 WA rated device also. I have another 1500WA rated Belkin, several years older and on its 4th set of batteries, that either isn't usb-hid con-formant, or when I last tried to run Nut against it, Nut's usb-hidraw wasn't up to speed. It is now running my milling machines computer. That computer is running Ubuntu-10.04, but emc is fussy about what you plug into a usb port, a usb key for instance is a guaranteed wrecked part because of the huge IRQ lockout times associated with the challenge/response time of the key as the I/O scheduler makes sure all the caches associated with have been flushed. That is from lessons learned while talking to myself. ;-) -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Intel CPUs are not defective, they just act that way. -- Henry Spencer ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser
Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
On 02/12/2010 15:28, Gene Heskett wrote: On Thursday, December 02, 2010 10:05:54 am Arjen de Korte did opine: Citeren Arnaud Quetteaquette@gmail.com: Thanks for the suggestions, I've added the flush statement as well as some debugging information. As this is a intermittent issue I decided to try overloading the UPS by sending it repeated beeper commands while watching the debug output. What appears to happen is that the UPS returns an unknown ~00R000 response. This means get_belkin_reply() returns -1, causing a datastale state is set when called from do_status(). you should remove the datastale() call since upsd will automatically flag the device as stalled if it has failed to update its data for 15 seconds (default of MAXAGE). Not at all! The upsd server will only declare the *driver* stale if it fails to respond within MAXAGE seconds. However, as long as it keeps answering the PING from the server, it will not be declared stale. This mechanism is something completely different from what happens if the driver calls dstate_datastale(). In that case the driver tells the upsd server that the *UPS* fails to respond. See the chapter on Staleness control in docs/new-drivers.txt. What really needs to be done, is that the driver doesn't treat the ~00R000 reply as an error condition. Apparently the UPS acknowledges the receipt of data, without further response (indicating that 0 bytes follow). The belkin driver doesn't accept this at the moment and requires that a reply follows. This is what needs to be changed. Last but not least, in most drivers, we allow a couple of missed replies before we call dstate_datastale() so that glitches don't lead to automatic reconnects. Best regards, Arjen I've been sitting here following this thread and wondering if the OP has told us everything? He may indeed be using serial at the ups, but if he has a pl2303 ser-usb adapter in the signal path and is using a ttyUSB# connection, then there could be a possibility that the pl2303 adapter is eating his lunch, specifically the first byte of a packet at frequent intervals, and this will confuse virtually all upsd implementations regardless of whose upsd it is, including belkin's own, now Jurassic dated bulldog software. Most of the more modern belkin UPS's do conform to the usb-hid specs, and I have had zero problems with loss of comm with mine over a pure usb circuit. usb 2-9: new low speed USB device using ohci_hcd and address 5 usb 2-9: New USB device found, idVendor=050d, idProduct=0751 usb 2-9: New USB device strings: Mfr=4, Product=20, SerialNumber=0 usb 2-9: Product: Belkin UPS usb 2-9: Manufacturer: Belkin It is a 1500 WA rated device also. I have another 1500WA rated Belkin, several years older and on its 4th set of batteries, that either isn't usb-hid con-formant, or when I last tried to run Nut against it, Nut's usb-hidraw wasn't up to speed. It is now running my milling machines computer. That computer is running Ubuntu-10.04, but emc is fussy about what you plug into a usb port, a usb key for instance is a guaranteed wrecked part because of the huge IRQ lockout times associated with the challenge/response time of the key as the I/O scheduler makes sure all the caches associated with have been flushed. That is from lessons learned while talking to myself. ;-) Nope, it's definately serial, UPS is on the D9 port (/dev/cuad0). I'm using the belkin driver, not the belkinunv or usb-hid drivers. Unfortunately Belkin seem to have disavowed all knowledge of the device as it's nowhere to be found on their website. Best description of it on a reseller's site: http://uk.insight.com/p/497211/belkin-regulator-pro-network-ups-ups-1400-va.html ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser
Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
On Thursday, December 02, 2010 04:36:43 pm John Bayly did opine: On 02/12/2010 15:28, Gene Heskett wrote: On Thursday, December 02, 2010 10:05:54 am Arjen de Korte did opine: Citeren Arnaud Quetteaquette@gmail.com: Thanks for the suggestions, I've added the flush statement as well as some debugging information. As this is a intermittent issue I decided to try overloading the UPS by sending it repeated beeper commands while watching the debug output. What appears to happen is that the UPS returns an unknown ~00R000 response. This means get_belkin_reply() returns -1, causing a datastale state is set when called from do_status(). you should remove the datastale() call since upsd will automatically flag the device as stalled if it has failed to update its data for 15 seconds (default of MAXAGE). Not at all! The upsd server will only declare the *driver* stale if it fails to respond within MAXAGE seconds. However, as long as it keeps answering the PING from the server, it will not be declared stale. This mechanism is something completely different from what happens if the driver calls dstate_datastale(). In that case the driver tells the upsd server that the *UPS* fails to respond. See the chapter on Staleness control in docs/new-drivers.txt. What really needs to be done, is that the driver doesn't treat the ~00R000 reply as an error condition. Apparently the UPS acknowledges the receipt of data, without further response (indicating that 0 bytes follow). The belkin driver doesn't accept this at the moment and requires that a reply follows. This is what needs to be changed. Last but not least, in most drivers, we allow a couple of missed replies before we call dstate_datastale() so that glitches don't lead to automatic reconnects. Best regards, Arjen I've been sitting here following this thread and wondering if the OP has told us everything? He may indeed be using serial at the ups, but if he has a pl2303 ser-usb adapter in the signal path and is using a ttyUSB# connection, then there could be a possibility that the pl2303 adapter is eating his lunch, specifically the first byte of a packet at frequent intervals, and this will confuse virtually all upsd implementations regardless of whose upsd it is, including belkin's own, now Jurassic dated bulldog software. Most of the more modern belkin UPS's do conform to the usb-hid specs, and I have had zero problems with loss of comm with mine over a pure usb circuit. usb 2-9: new low speed USB device using ohci_hcd and address 5 usb 2-9: New USB device found, idVendor=050d, idProduct=0751 usb 2-9: New USB device strings: Mfr=4, Product=20, SerialNumber=0 usb 2-9: Product: Belkin UPS usb 2-9: Manufacturer: Belkin It is a 1500 WA rated device also. I have another 1500WA rated Belkin, several years older and on its 4th set of batteries, that either isn't usb-hid con-formant, or when I last tried to run Nut against it, Nut's usb-hidraw wasn't up to speed. It is now running my milling machines computer. That computer is running Ubuntu-10.04, but emc is fussy about what you plug into a usb port, a usb key for instance is a guaranteed wrecked part because of the huge IRQ lockout times associated with the challenge/response time of the key as the I/O scheduler makes sure all the caches associated with have been flushed. That is from lessons learned while talking to myself. ;-) Nope, it's definately serial, UPS is on the D9 port (/dev/cuad0). I'm using the belkin driver, not the belkinunv or usb-hid drivers. Unfortunately Belkin seem to have disavowed all knowledge of the device as it's nowhere to be found on their website. Best description of it on a reseller's site: http://uk.insight.com/p/497211/belkin-regulator-pro-network-ups-ups-1400 -va.html That appears to be the Euro version of my older one, same box and front panel. And my snmp slot was empty, so I did not re-install the card slot for it when I last had it apart last spring to replace the batteries. I had to dismantle it quite far as the old ones had swelled and were bound in the frame. This one does not have a usb port, although it looks as if there might be a 9 pin usb header on its controller board, a dual row of 5 with one of the end pins missing. In fact, I wonder if a std computer breakout, back panel to motherboard usb kit might actually work? I have a spare of those, and the next time I haul it off the shelf (its 6+ feet up in the air, sitting on a brace across the rafters in my shop building, and pretty heavy for the old man to get it down back up), so I might just see what I blow if I hook it up to a usb port. I will probably be needing batteries again by then, and if I let the smoke out or break the mirror, I have had close to 10 years out of it anyway. -- Cheers, Gene There are four boxes to be used in defense of
[Nut-upsuser] Belkin Regulator Pro dropping connection and halting
I've a Belkin Regulator Pro (F6C1400-EUR) connected via serial to a FreeBSD machine using NUT v.2.4.3 Sometimes I get a series of logged messages saying that the data is switching between stale and valid, this in itself isn't an issue, however occasionally when the communication is re-established, upsmon gets a On battery message followed quickly by Battery low message, and calls on the system to halt. I know for a fact that the battery isn't low at any stage, as other UPSs have not reported a loss of power. This behaviour has only started occurring since using NUT rather than Belkin Bulldog (not supported on FreeBSD x64). Dec 1 08:02:46 rack upsd[941]: UPS [belkinreg] data is no longer stale Dec 1 08:02:47 rack upsmon[957]: Communications with UPS belkin...@localhost established Dec 1 08:02:48 rack upsd[941]: Data for UPS [belkinreg] is stale - check driver Dec 1 08:02:52 rack upsmon[957]: Poll UPS [belkin...@localhost] failed - Data stale Dec 1 08:02:52 rack upsmon[957]: Communications with UPS belkin...@localhost lost Dec 1 08:02:57 rack upsmon[957]: Poll UPS [belkin...@localhost] failed - Data stale Dec 1 08:02:59 rack upsd[941]: UPS [belkinreg] data is no longer stale Dec 1 08:03:00 rack upsd[941]: Data for UPS [belkinreg] is stale - check driver Dec 1 08:03:02 rack upsmon[957]: Poll UPS [belkin...@localhost] failed - Data stale Dec 1 08:03:27 rack last message repeated 5 times Dec 1 08:03:32 rack upsd[941]: UPS [belkinreg] data is no longer stale Dec 1 08:03:32 rack upsmon[957]: Communications with UPS belkin...@localhost established Dec 1 08:03:34 rack upsd[941]: Data for UPS [belkinreg] is stale - check driver Dec 1 08:03:37 rack upsmon[957]: Poll UPS [belkin...@localhost] failed - Data stale Dec 1 08:03:37 rack upsmon[957]: Communications with UPS belkin...@localhost lost Dec 1 08:03:42 rack upsmon[957]: Poll UPS [belkin...@localhost] failed - Data stale ... Dec 1 08:03:52 rack upsmon[957]: Poll UPS [belkin...@localhost] failed - Data stale Dec 1 08:03:52 rack upsmon[957]: Communications with UPS belkin...@localhost lost Dec 1 08:03:57 rack upsd[941]: UPS [belkinreg] data is no longer stale Dec 1 08:03:57 rack upsmon[957]: Communications with UPS belkin...@localhost established Dec 1 08:03:57 rack upsmon[957]: UPS belkin...@localhost on battery Dec 1 08:03:57 rack upsmon[957]: UPS belkin...@localhost battery is low Dec 1 08:03:57 rack upsmon[957]: Executing automatic power-fail shutdown Dec 1 08:03:57 rack upsmon[957]: Auto logout and shutdown proceeding I can see no reason why the NUT is seeing a battery low message. Unfortunately there is only a 5 minute resolution for the ups log: 20101201 072111 100 238.7 022 [OL] 012 50.1 20101201 072611 NA NA NA [NA] NA NA 20101201 073111 NA NA NA [NA] NA NA 20101201 073611 000 000.0 2400 [OL] 000 0.0 20101201 074111 NA NA NA [NA] NA NA 20101201 074611 NA NA NA [NA] NA NA 20101201 075111 NA NA NA [NA] NA NA 20101201 075611 NA NA NA [NA] NA NA 20101201 080111 NA NA NA [NA] NA NA 20101201 090322 NA NA NA [WAIT] NA NA 20101201 090822 100 234.5 024 [OL] 017 49.9 20101201 091348 NA NA NA [WAIT] NA NA 20101201 091848 100 236.7 022 [OL] 017 50.0 20101201 092348 100 235.7 022 [OL] 017 49.9 Has anyone got any suggestions? John ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser
Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
On Dec 1, 2010, at 7:42 AM, John Bayly wrote: I've a Belkin Regulator Pro (F6C1400-EUR) connected via serial to a FreeBSD machine using NUT v.2.4.3 Sometimes I get a series of logged messages saying that the data is switching between stale and valid, this in itself isn't an issue, however occasionally when the communication is re-established, upsmon gets a On battery message followed quickly by Battery low message, and calls on the system to halt. I know for a fact that the battery isn't low at any stage, as other UPSs have not reported a loss of power. This behaviour has only started occurring since using NUT rather than Belkin Bulldog (not supported on FreeBSD x64). I would suggest temporarily disabling the automated shutdown (maybe replacing it with some sort of notification), and running the driver with debugging enabled; however, there don't seem to be many upsdebugx() calls in the code. The get_belkin_reply() function looks fragile to me. Three seconds should be enough to fill the buffer, but if you put a few upsdebugx() calls around ser_get_buf_len(), it should be evident whether the read is timing out, or if there is a problem with the format of the response. -- Charles Lepple ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser
Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
Citeren Charles Lepple clep...@gmail.com: The get_belkin_reply() function looks fragile to me. Three seconds should be enough to fill the buffer, but if you put a few upsdebugx() calls around ser_get_buf_len(), it should be evident whether the read is timing out, or if there is a problem with the format of the response. Starting with ser_flush_io(upsfd); in the send_belkin_command function (before the ser_send call) might also help. It doesn't look like the driver deals with partial replies gracefully. Best regards, Arjen -- Please keep list traffic on the list (off-list replies will be rejected) ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser
Re: [Nut-upsuser] Belkin Regulator Pro dropping connection and halting
On 01/12/2010 14:17, Arjen de Korte wrote: Citeren Charles Lepple clep...@gmail.com: The get_belkin_reply() function looks fragile to me. Three seconds should be enough to fill the buffer, but if you put a few upsdebugx() calls around ser_get_buf_len(), it should be evident whether the read is timing out, or if there is a problem with the format of the response. Starting with ser_flush_io(upsfd); Thanks for the suggestions, I've added the flush statement as well as some debugging information. As this is a intermittent issue I decided to try overloading the UPS by sending it repeated beeper commands while watching the debug output. What appears to happen is that the UPS returns an unknown ~00R000 response. This means get_belkin_reply() returns -1, causing a datastale state is set when called from do_status(). in the send_belkin_command function (before the ser_send call) might also help. It doesn't look like the driver deals with partial replies gracefully. Accepted, however not dealing with a partial reply means that it will cause a datastale state, and there should be no way for it to cause OB or LB states. Best regards, Arjen I've also *beefed* up the logic in do_status(), it was assuming that if certain fields were not 1 they would be 0, instead I've made sure it checks if the fields are either 0 or 1, as it's possible the UPS could return blank fields. Of course, I don't know if this is the cause, but it's the only thought I have. Regards, John ___ Nut-upsuser mailing list Nut-upsuser@lists.alioth.debian.org http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser