Hi, no luck.
201 | Sep-22-2018 | 00:23:34 | Sensor #0 | Memory | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 80h 202 | Sep-29-2018 | 09:31:25 | Sensor #0 | Memory | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 80h 203 | Oct-13-2018 | 19:31:34 | Sensor #0 | Memory | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 80h 204 | Oct-20-2018 | 01:49:38 | Sensor #0 | Memory | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 80h debug: http://termbin.com/3x02 10.110.32.36: [ 811h] = product_id[16b] It seems that X10SLM-F (not X10SLM+-F) uses 2065 instead of 2051. You can check it in the full list: https://github.com/chu11/freeipmi-mirror/files/2651093/product_ids.txt When patched with 2065: 201,Sep-22-2018,00:23:34,Sensor #0,Memory,Warning,Correctable memory error ; DIMMB2(CPU1) 202,Sep-29-2018,09:31:25,Sensor #0,Memory,Warning,Correctable memory error ; DIMMB2(CPU1) 203,Oct-13-2018,19:31:34,Sensor #0,Memory,Warning,Correctable memory error ; DIMMB2(CPU1) 204,Oct-20-2018,01:49:38,Sensor #0,Memory,Warning,Correctable memory error ; DIMMB2(CPU1) Voila :) Best, Tom Hetmer CDN77 Operations supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com ----- Původní zpráva ----- > Odesilatel: "Albert Chu" <ch...@llnl.gov> > Příjemce: "Tom Hetmer" <tomas.het...@cdn77.com>, freeipmi-users@gnu.org > Datum: 12/12/18 02:18 > Předmět: Re: Re[4]: [Freeipmi-users] Decoding ram errors on supermicro > > Hey Tom, > > I got a branch on github with (what I hope) is support for the X10SLM+- > F. Could you give it a shot. The branch is called "supermicro_dimm". > > https://github.com/chu11/freeipmi-mirror/tree/supermicro_dimm > > ./autogen.sh > ./configure > make > ipmi-sel/ipmi-sel --interpret-oem-data > (add remote connection options as needed to ipmi-sel) > > If that doesn't work, could you do the following > > ipmi-sel/ipmi-sel --debug --display=201 > > (i picked 201 as one of the DIMM output belows. Doesn't have to be > that one, just any specific DIMM SEL event). > > Thanks, > > Al > > On Tue, 2018-12-11 at 13:33 +0100, Tom Hetmer wrote: > > Supermicro (after pointing me to web interface and SNMP...): > > "Sorry, we do not have this Information at our support desk. you can > > request this via your sales channel, but it can be that you would > > need to sign an NDA for such information." > > > > So we're on our own, I don't have any better contact as we buy from a > > reseller. > > Besides they'd want an NDA for that 3 lines of code. > > > > Best, > > Tom Hetmer > > > > CDN77 Operations > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com > > > > ----- Původní zpráva ----- > > Odesilatel: "Tom Hetmer" <tomas.het...@cdn77.com> > > Příjemce: "Al Chu" <ch...@llnl.gov>, freeipmi-users@gnu.org > > Datum: 12/11/18 12:09 > > Předmět: Re[3]: [Freeipmi-users] Decoding ram errors on supermicro > > > > Hey, > > > > so that was fast - we've got an older X10SLM-F rented by a customer. > > > > IPMI web says > > 201 2018/09/22 00:23:34 OEM Memory Correctable Memory ECC > > @ DIMMB2(CPU1) > > 202 2018/09/29 09:31:25 OEM Memory Correctable Memory ECC > > @ DIMMB2(CPU1) > > 203 2018/10/13 19:31:34 OEM Memory Correctable Memory ECC > > @ DIMMB2(CPU1) > > 204 2018/10/20 01:49:38 OEM Memory Correctable Memory ECC > > @ DIMMB2(CPU1) > > > > freeipmi: > > ID | Date | Time | Name | Type > > | State | Event > > 7 | Jan-21-2016 | 15:26:16 | FANA | Fan > > | Critical | Lower Critical - going low ; Sensor Reading = 0.00 RPM > > ; Threshold = 600.00 RPM > > 8 | Jan-21-2016 | 15:26:16 | FANA | Fan > > | Critical | Lower Non-recoverable - going low ; Sensor Reading = > > 0.00 RPM ; Threshold = 400.00 RPM > > 9 | Jan-21-2016 | 15:26:25 | FANA | Fan > > | Critical | Lower Non-recoverable - going low ; Sensor Reading = > > 13300.00 RPM ; Threshold = 400.00 RPM > > 10 | Jan-21-2016 | 15:26:25 | FANA | Fan > > | Warning | Lower Critical - going low ; Sensor Reading = 13300.00 > > RPM ; Threshold = 600.00 RPM > > 201 | Sep-22-2018 | 00:23:34 | Sensor #0 | Memory > > | Warning | Correctable memory error ; OEM Event Data2 code = 2Bh ; > > OEM Event Data3 code = 80h > > 202 | Sep-29-2018 | 09:31:25 | Sensor #0 | Memory > > | Warning | Correctable memory error ; OEM Event Data2 code = 2Bh ; > > OEM Event Data3 code = 80h > > 203 | Oct-13-2018 | 19:31:34 | Sensor #0 | Memory > > | Warning | Correctable memory error ; OEM Event Data2 code = 2Bh ; > > OEM Event Data3 code = 80h > > 204 | Oct-20-2018 | 01:49:38 | Sensor #0 | Memory > > | Warning | Correctable memory error ; OEM Event Data2 code = 2Bh ; > > OEM Event Data3 code = 80h > > > > We'll ask the customer for downtime to replace it, all should then be > > correct as it's official data from supermicro's own interface. > > > > Best, > > Tom Hetmer > > > > CDN77 Operations > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com > > > > ----- Původní zpráva ----- > > Odesilatel: "Tom Hetmer" <tomas.het...@cdn77.com> > > Příjemce: freeipmi-users@gnu.org, "Al Chu" <ch...@llnl.gov> > > Datum: 12/11/18 11:59 > > Předmět: Re[2]: [Freeipmi-users] Decoding ram errors on supermicro > > > > Hi, > > > > it appears we have no ECC errors on the servers we directly own right > > now. > > I can let you know when we get one though. > > > > We rent out some machines to customers as well, maybe there's some > > errors there => my colleague will check the report today. > > > > I also created a ticket with Supermicro just if they can confirm > > we're looking at the right code/add any official details. > > > > Best, > > Tom Hetmer > > > > CDN77 Operations > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com > > > > ----- Původní zpráva ----- > > > Odesilatel: "Al Chu" <ch...@llnl.gov> > > > Příjemce: "Tom Hetmer" <tomas.het...@cdn77.com>, freeipmi-users@gnu > > .org > > > Datum: 12/11/18 02:28 > > > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro > > > > > > Hey Tom, > > > > > > Is there a specific motherboard (amongst the product IDs you > > mentioned > > > below) you have with a dimm error that we can test on. To make > > sure I > > > don't make a major mistake, I'd like to code to 1 motherboard > > first. > > > > > > Thanks, > > > Al > > > > > > > > > On Wed, 2018-12-05 at 10:48 -0800, Albert Chu wrote: > > > > On Wed, 2018-12-05 at 03:38 +0100, Tom Hetmer wrote: > > > > > Alright, added to github. > > > > > > > > > > Here's the output from bmc-info for that particular board. > > > > > Product ID : 2201 > > > > > [Mon Dec 3 12:08:13 2018] DMI: Supermicro X10DRH LN4/X10DRH- > > CLN4, > > > > > BIOS 2.0 01/30/2016 > > > > > > > > > > > > > > > I guess you'll support it based on the product ID? > > > > > > > > Yes! Thanks. I'll put these in the ticket too. > > > > > > > > Al > > > > > > > > > So if there are any other (X10) boards with different product > > ID > > > > > but > > > > > the same SEL output I'll have to send it again, correct? > > > > > > > > > > > > > > > I have all kinds of numbers on other machines, > > > > > ie. > > > > > X10DRW-E => 2148 > > > > > X11SPi-TF => 2369 > > > > > X10SLL-F => 2049 > > > > > X10DRL-i => 2097 > > > > > X11DDW-NT => 2407 > > > > > X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F => 2051 > > > > > > > > > > > > > > > and so on.. I think we have at least 1/4 of the boards they > > > > > manufacture. > > > > > X9s are under 2000, X11 seems to be 23xx. But that's maybe too > > much > > > > > reverse engineering to you ;) > > > > > I can try to ping them and ask about details but I got no > > offical > > > > > contact with Supermicro. > > > > > > > > > > > > > > > Best, > > > > > Tom Hetmer > > > > > > > > > > > > > > > CDN77 Operations > > > > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com > > > > > > > > > > ----- Původní zpráva ----- > > > > > > Odesilatel: "Albert Chu" <ch...@llnl.gov> > > > > > > Příjemce: "Tom Hetmer" <tomas.het...@cdn77.com>, freeipmi-use > > rs@g > > > > > > nu > > > > > > .org > > > > > > Datum: 12/04/18 19:40 > > > > > > Předmět: Re: [Freeipmi-users] Decoding ram errors on > > supermicro > > > > > > > > > > > > On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote: > > > > > > > Sure. It seems there's a similar ticket > > > > > > > already: https://github.com/chu11/freeipmi-mirror/issues/19 > > > > > > > > > > > > Ahh, if you could, update it with info from ipmitool / > > ipmiutil. > > > > > > I > > > > > > was > > > > > > reluctant to add support based on reverse engineering. But > > if > > > > > > other > > > > > > tools have "official" interpretations from Supermicro, I'm > > more > > > > > > confident in the addition. > > > > > > > > > > > > > Yep, that's the code. ipmitool and a few others decode it > > too. > > > > > > > > > > > > > > > > > > > > > We have a *lot* of Supermicros so I can help with testing > > if > > > > > > > needed - > > > > > > > but we don't get that much CRC errors though :) > > > > > > > > > > > > The one thing I'll need is product ID numbers (you can get > > from > > > > > > bmc- > > > > > > info) and the name of the product. This goes into the > > > > > > documentation > > > > > > and some of the code. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Al > > > > > > > > > > > > > So I guess we'd have to wait till one pops up. But I hope > > the > > > > > > > 'ver 2' > > > > > > > method from ipmiutil works fine. > > > > > > > We used ipmitool in our monitoring before and it was > > accurate > > > > > > > but > > > > > > > slow, that's why I rewrote it all to use freeipmi. > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > Tom Hetmer > > > > > > > > > > > > > > > > > > > > > CDN77 Operations > > > > > > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com > > > > > > > > > > > > > > ----- Původní zpráva ----- > > > > > > > > Odesilatel: "Albert Chu" <ch...@llnl.gov> > > > > > > > > Příjemce: "Tom Hetmer" <tomas.het...@cdn77.com>, > > freeipmi- > > > > > > > > users > > > > > > > > @gnu > > > > > > > > .org > > > > > > > > Datum: 12/03/18 21:06 > > > > > > > > Předmět: Re: [Freeipmi-users] Decoding ram errors on > > > > > > > > supermicro > > > > > > > > > > > > > > > > Hi Tom, > > > > > > > > > > > > > > > > Thanks for the pointer to ipmiutil's code. I assume you > > > > > > > > found > > > > > > > > this > > > > > > > > comment: > > > > > > > > > > > > > > > > --- > > > > > > > > /* ver 2 method: 2A 80 = P1_DIMMB1 > > > > > > > > > > */ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* SuperMicro > > > > > > > > > > says: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * pair: %c (data2 >> 4) + 0x40 + (data3 & > > 0x3) * > > > > > > > > 3, > > > > > > > > > > (='B') > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * dimm: %c (data2 & 0xf) + > > > > > > > > > > 0x27, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * cpu: %x (data3 & 0x03) + > > > > > > > > > > 1); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > */ > > > > > > > > --- > > > > > > > > > > > > > > > > I can definitely add it to my todo list. > > > > > > > > > > > > > > > > Would you mind writing up an issue on github here? > > > > > > > > > > > > > > > > https://github.com/chu11/freeipmi-mirror > > > > > > > > > > > > > > > > Al > > > > > > > > > > > > > > > > On Mon, 2018-12-03 at 17:55 +0100, Tom Hetmer wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > it'd be good if freeipmi supported decoding the > > supermicro > > > > > > > > > ECC > > > > > > > > > errors. > > > > > > > > > > > > > > > > > > > > > > > > > > > Manufacturer: Supermicro > > > > > > > > > Product Name: X10DRH LN4 > > > > > > > > > eg. > > > > > > > > > freeipmi > > > > > > > > > 1,Dec-01-2018,06:37:53,Sensor > > > > > > > > > #0,Memory,Critical,Uncorrectable > > > > > > > > > memory > > > > > > > > > error ; OEM Event Data2 code = 3Ah ; OEM Event Data3 > > code = > > > > > > > > > 81h > > > > > > > > > > > > > > > > > > > > > > > > > > > web interface > > > > > > > > > 1 | 12/01/2018 | 06:37:53 | Memory | Uncorrectable ECC > > > > > > > > > (@DIMMG1(CPU2)) | Asserted > > > > > > > > > > > > > > > > > > > > > > > > > > > something like this worked for me (stolen from > > ipmiutil) > > > > > > > > > > > > > > > > > > > > > > > > > > > $cpu = ($data3 & 0x03) + 1; > > > > > > > > > > > > > > > > > > > > > > > > > > > $NPAIRS = 26; > > > > > > > > > $rgpairs = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; > > > > > > > > > > > > > > > > > > > > > > > > > > > $bdata = "0x".$data2.$data3; > > > > > > > > > $bdata = hexdec($bdata); > > > > > > > > > $pair = (($bdata & 0xF0) >> 4) - 1; > > > > > > > > > > > > > > > > > > > > > > > > > > > if ($pair < 0) $pair = 0; > > > > > > > > > if ($pair > $NPAIRS) $pair = $NPAIRS - 1; > > > > > > > > > > > > > > > > > > > > > > > > > > > $pair = $rgpairs[$pair - 1]; > > > > > > > > > > > > > > > > > > > > > > > > > > > $dimm = $bdata & 0x0F; > > > > > > > > > > > > > > > > > > > > > > > > > > > $dimm may be incorrect as the original code decrements > > 9, > > > > > > > > > but > > > > > > > > > on > > > > > > > > > that > > > > > > > > > board it was wrong so i changed it to get the right > > result > > > > > > > > > - > > > > > > > > > we'll > > > > > > > > > see if it keeps getting the right values. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Tom Hetmer > > > > > > > > > > > > > > > > > > > > > > > > > > > CDN77 Operations > > > > > > > > > supp...@cdn77.com / +44 (0) 20 3514 2399 / > > www.cdn77.com > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Freeipmi-users mailing list > > > > > > > > > Freeipmi-users@gnu.org > > > > > > > > > https://lists.gnu.org/mailman/listinfo/freeipmi-users > > > > > > > > > > > > > > > > -- > > > > > > > > Albert Chu > > > > > > > > ch...@llnl.gov > > > > > > > > Computer Scientist > > > > > > > > High Performance Systems Division > > > > > > > > Lawrence Livermore National Laboratory > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Freeipmi-users mailing list > > > > > > > Freeipmi-users@gnu.org > > > > > > > https://lists.gnu.org/mailman/listinfo/freeipmi-users > > > > > > > > > > > > -- > > > > > > Albert Chu > > > > > > ch...@llnl.gov > > > > > > Computer Scientist > > > > > > High Performance Systems Division > > > > > > Lawrence Livermore National Laboratory > > > > > > > > > > _______________________________________________ > > > > > Freeipmi-users mailing list > > > > > Freeipmi-users@gnu.org > > > > > https://lists.gnu.org/mailman/listinfo/freeipmi-users > -- > Albert Chu > ch...@llnl.gov > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-users mailing list Freeipmi-users@gnu.org https://lists.gnu.org/mailman/listinfo/freeipmi-users