Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
On Fri, 2006-11-17 at 09:53 -0600, Corey Minyard wrote: > > oopses (appended below). Does this patch require one or more of the > > other patches in the 39.1 set to be happy (for instance, the > > allow-hot-smi-remove patch), or am I running into some other issue? > > > I looked at this yesterday and today, I cannot figure out what would be > different between the two scenarios. I could not reproduce this, and > it's probably best to just take all the patches as that is what I > tested. I did test different loading orders in my testing. > > I'll try to look at this again today. I totally agree that it would be best to use all the patches rather than to just pull out a few of them and I would always prefer to do that but in the case of the particular scenario/issue I'm currently working on, it won't be possible in the near-term. :-( I did try loading the ipmi-allow-hot-smi-remove patch along with the other 4 patches I listed and it did seem to fix the driver-load-order oops. Is that a stand-alone patch or are there others in the set that need to be loaded along with it? I ran some tests with this new 5-patch subset and didn't find any problems but my testing wasn't exhaustive so I'm hoping to verify that grabbing only these 5 alone won't introduce some other issue in some area I didn't touch in my testing. > > I got a bit of info about the order in which the SMBIOS table is > > populated and found out that it's currently populated in order of > > increasing KCS I/O address but that this isn't necessarily an ordering > > scheme that can be assumed for the future. Also, regarding changing the > > BIOS to make the deviceID unique across BMCs, I was told that if these > > changes were made, we would likely be facing many issues such as > > DeviceID mismatches with what's coded up in the SDR data, etc. So I > > suspect it's something that might not happen anytime soon (if ever). > > > That really doesn't make any sense. The only place I could find where > this Device ID is used is in the type 13 SDR: "Management Controller > Confirmation Record". This record is used by utility software to record > that it found a specific management controller in the system. It seems > of limited value to me, anyway, and having different device IDs would > seem to make this easier, not harder, to identify the different > management controllers. From what I can tell, the use of this is for > system software to record the current management controller > configuration. Then if system software finds something different, it > can say "Hey, something changed" and handle it. > > Note that the term "Device ID" is heavily overloaded in the IPMI spec. > It also has "FRU Device ID" and "Device ID String", but those are > completely different things. > > I see no other reliable mechanism to correlate management controllers > with nodes, especially if nodes ever become dynamic. I really doubt you > will have any issues unless you have software that is hardcoded to > handle this. That doesn't seem so, since they are all the same and it > doesn't provide any real useful information. Perhaps the group doing > the work can suggest a reliable way to correlate the nodes and the > management controllers? > Thanks very much for your input on this. I'll take what you've said back to the BIOS folks and re-open the discussion. :-) Thank you very much again for your ongoing and excellent help. :-) Carol - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Carol Hebert wrote: > Hi Corey, > > I wanted to let you know about some of the testing I've done with some > of the new 39.1 patches and also to ask you about an issue I found. > > First, I wanted to ask you about the ipmi-remove-device-interface-limits > patch. It seems that when I have this patch loaded (along with just the > 3 multinode fix patches listed below), the drivers work fine if ipmi_si > is loaded last, but if ipmi_si loaded before ipmi_devintf, the system > oopses (appended below). Does this patch require one or more of the > other patches in the 39.1 set to be happy (for instance, the > allow-hot-smi-remove patch), or am I running into some other issue? > I looked at this yesterday and today, I cannot figure out what would be different between the two scenarios. I could not reproduce this, and it's probably best to just take all the patches as that is what I tested. I did test different loading orders in my testing. I'll try to look at this again today. > Also, I wanted to let you know that I was able to get some time on an > 8-way node and tested the following 39.1 patches: > ipmi-fix-device-model-name.patch, > ipmi-remove-interface-number-limits.patch > ipmi-handle-sysfs-errors.patch > ipmi-pass-sysfs-name-from-lower-level-driver.patch > > They seemed to work fine (with the drivers loaded in the "good" order > described above). All 8 device nodes were created and seemed to be > equally usable. > Ok, thanks. > I got a bit of info about the order in which the SMBIOS table is > populated and found out that it's currently populated in order of > increasing KCS I/O address but that this isn't necessarily an ordering > scheme that can be assumed for the future. Also, regarding changing the > BIOS to make the deviceID unique across BMCs, I was told that if these > changes were made, we would likely be facing many issues such as > DeviceID mismatches with what's coded up in the SDR data, etc. So I > suspect it's something that might not happen anytime soon (if ever). > That really doesn't make any sense. The only place I could find where this Device ID is used is in the type 13 SDR: "Management Controller Confirmation Record". This record is used by utility software to record that it found a specific management controller in the system. It seems of limited value to me, anyway, and having different device IDs would seem to make this easier, not harder, to identify the different management controllers. From what I can tell, the use of this is for system software to record the current management controller configuration. Then if system software finds something different, it can say "Hey, something changed" and handle it. Note that the term "Device ID" is heavily overloaded in the IPMI spec. It also has "FRU Device ID" and "Device ID String", but those are completely different things. I see no other reliable mechanism to correlate management controllers with nodes, especially if nodes ever become dynamic. I really doubt you will have any issues unless you have software that is hardcoded to handle this. That doesn't seem so, since they are all the same and it doesn't provide any real useful information. Perhaps the group doing the work can suggest a reliable way to correlate the nodes and the management controllers? Thanks -Corey - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Hi Corey, I wanted to let you know about some of the testing I've done with some of the new 39.1 patches and also to ask you about an issue I found. First, I wanted to ask you about the ipmi-remove-device-interface-limits patch. It seems that when I have this patch loaded (along with just the 3 multinode fix patches listed below), the drivers work fine if ipmi_si is loaded last, but if ipmi_si loaded before ipmi_devintf, the system oopses (appended below). Does this patch require one or more of the other patches in the 39.1 set to be happy (for instance, the allow-hot-smi-remove patch), or am I running into some other issue? Also, I wanted to let you know that I was able to get some time on an 8-way node and tested the following 39.1 patches: ipmi-fix-device-model-name.patch, ipmi-remove-interface-number-limits.patch ipmi-handle-sysfs-errors.patch ipmi-pass-sysfs-name-from-lower-level-driver.patch They seemed to work fine (with the drivers loaded in the "good" order described above). All 8 device nodes were created and seemed to be equally usable. I got a bit of info about the order in which the SMBIOS table is populated and found out that it's currently populated in order of increasing KCS I/O address but that this isn't necessarily an ordering scheme that can be assumed for the future. Also, regarding changing the BIOS to make the deviceID unique across BMCs, I was told that if these changes were made, we would likely be facing many issues such as DeviceID mismatches with what's coded up in the SDR data, etc. So I suspect it's something that might not happen anytime soon (if ever). Anyway, hope this info is useful. Thanks for all your help, Carol Hebert Unable to handle kernel paging request at 000101ab RIP: [] kref_get+0xc/0x47 (<--this is where kref gets dereferenced to get refcount; RDI/RBX hold kref) PGD 2894c067 PUD 0 Oops: [1] SMP last sysfs file: /class/ipmi/ipmi0/dev CPU 0 Modules linked in: ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) cpufreq_ondemand(U) video(U) sbs(U) i2c_ec(U) i2c_core(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) ipv6(U) parport_pc(U) lp(U) parport(U) sg(U) intel_rng(U) shpchp(U) bnx2(U) tg3(U) pcspkr(U) serio_raw(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) ata_piix(U) libata(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) Pid: 3955, comm: modprobe Not tainted 2.6.18-2714_ipmitest #2 RIP: 0010:[] [] kref_get+0xc/0x47 RSP: :810028bfdb68 EFLAGS: 00010292 RAX: 81003b5ee9d0 RBX: 000101ab RCX: 81003b5ee000 RDX: 81003b5ee9d7 RSI: 802dc7c7 RDI: 000101ab RBP: 810028bfdb78 R08: 8058a8d0 R09: R10: 81003b5ee9d0 R11: 0020 R12: fff4 R13: 802dc7c0 R14: 81002cb95700 R15: 81003b1c3f08 FS: 2aac5240() GS:8049b000() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 000101ab CR3: 28462000 CR4: 06e0 Process modprobe (pid: 3955, threadinfo 810028bfc000, task 8100029e00c0) Stack: 810028bfdb98 0001018f 810028bfdb98 8005b2c8 81002cb95700 81003b5eee68 810028bfdbd8 80110830 0001018f 810035098270 8100336204c8 Call Trace: [] kobject_get+0x1a/0x21 [] sysfs_create_link+0xbb/0x116 [] class_device_add+0x267/0x46f [] class_device_register+0x19/0x1d [] class_device_create+0xf8/0x129 [] :ipmi_devintf:ipmi_new_smi+0x72/0x98 [] :ipmi_msghandler:ipmi_smi_watcher_register +0xd8/0x12f [] :ipmi_devintf:init_ipmi_devintf+0xc7/0x100 [] sys_init_module+0x1708/0x18cc [] tracesys+0xd1/0xdb DWARF2 unwinder stuck at tracesys+0xd1/0xdb Leftover inexact backtrace: Code: 8b 07 85 c0 75 2e e8 fe a1 05 00 48 c7 c1 40 a5 29 80 49 89 RIP [] kref_get+0xc/0x47 RSP CR2: 000101ab = [ BUG: lock held at task exit time! ] - modprobe/3955 is exiting with locks still held! 2 locks held by modprobe/3955: #0: (reg_list_mutex){--..}, at: [] mutex_lock +0x2a/0x2e #1: (&sysfs_inode_imutex_key){--..}, at: [] mutex_lock+0x2a/0x2e stack backtrace: Call Trace: [] show_trace+0xae/0x336 [] dump_stack+0x15/0x17 [] debug_check_no_locks_held+0x87/0x8b [] do_exit+0x8c2/0x911 [] do_page_fault+0x7ba/0x842 [] error_exit+0x0/0x96 DWARF2 unwinder stuck at error_exit+0x0/0x96 Leftover inexact backtrace: [] kref_get+0xc/0x47 [] __kmalloc+0x125/0x134 [] kobject_get+0x1a/0x21 [] sysfs_create_link+0xbb/0x116 [] class_device_add+0x267/0x46f [] class_device_register+0x19/0x1d [] class_device_create+0xf8/0x129 [] __mutex_lock_slowpath+0x248/0x261 [] mark_held_locks+0x53/0x79 [] mutex_lock+0x2a/0x2e [] __mutex_lock_slowpath+0x248/0x261 [] debug_mutex_free_waiter+0x5a/0x5e [] __mutex_lock_slowp
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
On Fri, 2006-10-20 at 14:34 -0500, Corey Minyard wrote: > > > Hmm, that might be harder on 2.4. I have to review the set of patches > to see what will go on for the 2.4 release, so I'll look at it then. > It seems to me that 2.4 and the multi-node beasts wouldn't be a good > match, but if it's needed... Thanks much. :-) I do know of folks who run 2.4 kernels and support up to a 4-node system. I guess that specific configuration would work fine with the current table code. Although I don't know of anyone offhand who's running 2.4 on larger configs, we do support larger systems and we support 2.4 distros so it's not inconceivable that someone will eventually put the two together. :-} Thanks much again, Carol Hebert - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Carol Hebert wrote: > On Thu, 2006-10-19 at 21:46 -0500, Corey Minyard wrote: > >> . >> >> I'm waiting for one more patch to be finished up and tested, and I'm >> putting out a 2.6.18 patch set. >> >> > > That's excellent news! I'll run the patch set on my multi-nodes as soon > as it's out. BTW: I was wondering if it would be much trouble to get > the table->list patch put into the 2.4 tree as well? I'd be happy to > help and would be happy to test it out on a multi-node. :-) > Hmm, that might be harder on 2.4. I have to review the set of patches to see what will go on for the 2.4 release, so I'll look at it then. It seems to me that 2.4 and the multi-node beasts wouldn't be a good match, but if it's needed... -Corey - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
On Thu, 2006-10-19 at 21:46 -0500, Corey Minyard wrote: > . > > I'm waiting for one more patch to be finished up and tested, and I'm > putting out a 2.6.18 patch set. > That's excellent news! I'll run the patch set on my multi-nodes as soon as it's out. BTW: I was wondering if it would be much trouble to get the table->list patch put into the 2.4 tree as well? I'd be happy to help and would be happy to test it out on a multi-node. :-) Thank you very much, :-) Carol Hebert - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Carol Hebert wrote: > Hi, > > Wow! I barely hit return on my email and the patch was in my > inbox!! :-) > Well, I had it sitting there, so it was easy. Sorry about the compile errors, those fixes had snuck into a later patch but didn't get put into the right place. I'm waiting for one more patch to be finished up and tested, and I'm putting out a 2.6.18 patch set. -Corey - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Hi, Wow! I barely hit return on my email and the patch was in my inbox!! :-) I made a couple of adjustments to the patch to make my compiler happy. In the ipmi_smi_watcher_register() routine, I deleted the "&" on to_deliver; also, I added GFP_KERNEL as a second arg to kmalloc: int ipmi_smi_watcher_register(struct ipmi_smi_watcher *watcher) { ipmi_smi_t intf; - struct list_head to_deliver = LIST_HEAD_INIT(&to_deliver); + struct list_head to_deliver = LIST_HEAD_INIT(to_deliver); struct watcher_entry *e, *e2; mutex_lock(&ipmi_interfaces_mutex); list_for_each_entry_rcu(intf, &ipmi_interfaces, link) { if (intf->intf_num == -1) continue; - e = kmalloc(sizeof(*e)); + e = kmalloc(sizeof(*e), GFP_KERNEL); if (!e) goto out_err; e->intf_num = intf->intf_num; list_add_tail(&e->link, &to_deliver); } I ran it on my 2-node system and it seemed to work as well as the previous table-oriented patched version (e.g. great! :-). I'm still working on getting an 8-node to test it on -- hopefully I'll get one next week. Thanks very much again, :-) Carol Hebert On Thu, 2006-10-19 at 16:23 -0500, Corey Minyard wrote: > Ok, patch is attached. > > Carol Hebert wrote: > > On Wed, 2006-10-18 at 13:37 -0700, Carol Hebert wrote: > > > >> Hi Corey, > >> > >> This latest patch worked great on my 2-node system! :-D I'll try to > >> get some time on a 4-node and 8-node system asap to test it out on them > >> as well. > >> > > > > Oops, I guess I'll probably need that patch you were talking about > > earlier to increase the number of supported nodes to > 4 to test the > > 8-node system properly. :-} I think you mentioned changing the table > > to a list to be able to support an arbitrary number of devices? I was > > wondering if you had any idea when you might be able to get a chance to > > make that change? > > > > Thanks again for all your help, > > > > Carol Hebert > > > > > > - > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job > > easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > ___ > > Openipmi-developer mailing list > > Openipmi-developer@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/openipmi-developer > > > - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Ok, patch is attached. Carol Hebert wrote: > On Wed, 2006-10-18 at 13:37 -0700, Carol Hebert wrote: > >> Hi Corey, >> >> This latest patch worked great on my 2-node system! :-D I'll try to >> get some time on a 4-node and 8-node system asap to test it out on them >> as well. >> > > Oops, I guess I'll probably need that patch you were talking about > earlier to increase the number of supported nodes to > 4 to test the > 8-node system properly. :-} I think you mentioned changing the table > to a list to be able to support an arbitrary number of devices? I was > wondering if you had any idea when you might be able to get a chance to > make that change? > > Thanks again for all your help, > > Carol Hebert > > > - > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > ___ > Openipmi-developer mailing list > Openipmi-developer@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/openipmi-developer > This patch removes the arbitrary limit of number of IPMI interfaces. Signed-off-by: Corey Minyard <[EMAIL PROTECTED]> Index: linux-2.6.18/drivers/char/ipmi/ipmi_msghandler.c === --- linux-2.6.18.orig/drivers/char/ipmi/ipmi_msghandler.c +++ linux-2.6.18/drivers/char/ipmi/ipmi_msghandler.c @@ -193,6 +193,9 @@ struct ipmi_smi struct kref refcount; + /* Used for a list of interfaces. */ + struct list_head link; + /* The list of upper layers that are using me. seq_lock * protects this. */ struct list_head users; @@ -338,13 +341,6 @@ struct ipmi_smi }; #define to_si_intf_from_dev(device) container_of(device, struct ipmi_smi, dev) -/* Used to mark an interface entry that cannot be used but is not a - * free entry, either, primarily used at creation and deletion time so - * a slot doesn't get reused too quickly. */ -#define IPMI_INVALID_INTERFACE_ENTRY ((ipmi_smi_t) ((long) 1)) -#define IPMI_INVALID_INTERFACE(i) (((i) == NULL) \ - || (i == IPMI_INVALID_INTERFACE_ENTRY)) - /** * The driver model view of the IPMI messaging driver. */ @@ -354,11 +350,8 @@ static struct device_driver ipmidriver = }; static DEFINE_MUTEX(ipmidriver_mutex); -#define MAX_IPMI_INTERFACES 4 -static ipmi_smi_t ipmi_interfaces[MAX_IPMI_INTERFACES]; - -/* Directly protects the ipmi_interfaces data structure. */ -static DEFINE_SPINLOCK(interfaces_lock); +static struct list_head ipmi_interfaces = LIST_HEAD_INIT(ipmi_interfaces); +static DEFINE_MUTEX(ipmi_interfaces_mutex); /* List of watchers that want to know when smi's are added and deleted. */ @@ -413,25 +406,50 @@ static void intf_free(struct kref *ref) kfree(intf); } +struct watcher_entry { + struct list_head link; + int intf_num; +}; + int ipmi_smi_watcher_register(struct ipmi_smi_watcher *watcher) { - int i; - unsigned long flags; + ipmi_smi_t intf; + struct list_head to_deliver = LIST_HEAD_INIT(&to_deliver); + struct watcher_entry *e, *e2; + + mutex_lock(&ipmi_interfaces_mutex); + + list_for_each_entry_rcu(intf, &ipmi_interfaces, link) { + if (intf->intf_num == -1) + continue; + e = kmalloc(sizeof(*e)); + if (!e) + goto out_err; + e->intf_num = intf->intf_num; + list_add_tail(&e->link, &to_deliver); + } down_write(&smi_watchers_sem); list_add(&(watcher->link), &smi_watchers); up_write(&smi_watchers_sem); - spin_lock_irqsave(&interfaces_lock, flags); - for (i = 0; i < MAX_IPMI_INTERFACES; i++) { - ipmi_smi_t intf = ipmi_interfaces[i]; - if (IPMI_INVALID_INTERFACE(intf)) - continue; - spin_unlock_irqrestore(&interfaces_lock, flags); - watcher->new_smi(i, intf->si_dev); - spin_lock_irqsave(&interfaces_lock, flags); + + mutex_unlock(&ipmi_interfaces_mutex); + + list_for_each_entry_safe(e, e2, &to_deliver, link) { + list_del(&e->link); + watcher->new_smi(e->intf_num, intf->si_dev); + kfree(e); } - spin_unlock_irqrestore(&interfaces_lock, flags); + + return 0; + + out_err: + list_for_each_entry_safe(e, e2, &to_deliver, link) { + list_del(&e->link); + kfree(e); + } + return -ENOMEM; } int ipmi_smi_watcher_unregister(struct ipmi_smi_watcher *watcher) @@ -766,17 +784,19 @@ int ipmi_create_user(unsigned int if (!new_user) return -ENOMEM; - spin_lock_irqsave(&interfaces_lock, flags); - intf = ipmi_interfaces[if_num]; - if ((if_num >= MAX_IPMI_INTERFACES) || IPMI_INVALID_INTERFACE(intf)) { - spin_unlock_irqrestore(&interfaces_lock, flags); - rv = -EINVAL; - goto out_kfree; + rcu_read_lock(); + list_for_each_entry_rcu(intf, &ipmi_interfaces, link) { + if (intf->intf_num == if_num) + goto found; } + rcu_read_unlock(); + rv = -EINVAL; +
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
On Wed, 2006-10-18 at 13:37 -0700, Carol Hebert wrote: > Hi Corey, > > This latest patch worked great on my 2-node system! :-D I'll try to > get some time on a 4-node and 8-node system asap to test it out on them > as well. Oops, I guess I'll probably need that patch you were talking about earlier to increase the number of supported nodes to > 4 to test the 8-node system properly. :-} I think you mentioned changing the table to a list to be able to support an arbitrary number of devices? I was wondering if you had any idea when you might be able to get a chance to make that change? Thanks again for all your help, Carol Hebert - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
On Wed, 2006-10-18 at 16:17 -0500, Corey Minyard wrote: > Carol Hebert wrote: > > Hi Corey, > > > > This latest patch worked great on my 2-node system! :-D I'll try to > > get some time on a 4-node and 8-node system asap to test it out on them > > as well. > > > > I've listed below how ipmi and the BMCs are now represented in sysfs. > > Do you still want me to continue working on trying to get some unique > > BMC device ID/GUID change made in the f/w as well (and in the process > > find out what we have now ;-}? I'm also working on finding out whether > > or not it's guaranteed that the BMCs are listed in node order in the > > SMBIOS table. > > > It's probably best to get the unique device id in the firmware. That is > the only sure way to know that a specific IPMI device maps to a specific > node's BMC, and IMHO it's the right way to do things. Will do. I'll let you know what I find out and keep you apprised of the progress. Thanks again, Carol Hebert - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Carol Hebert wrote: > Hi Corey, > > This latest patch worked great on my 2-node system! :-D I'll try to > get some time on a 4-node and 8-node system asap to test it out on them > as well. > > I've listed below how ipmi and the BMCs are now represented in sysfs. > Do you still want me to continue working on trying to get some unique > BMC device ID/GUID change made in the f/w as well (and in the process > find out what we have now ;-}? I'm also working on finding out whether > or not it's guaranteed that the BMCs are listed in node order in the > SMBIOS table. > It's probably best to get the unique device id in the firmware. That is the only sure way to know that a specific IPMI device maps to a specific node's BMC, and IMHO it's the right way to do things. > Thanks very much for your help and for making my day! :-D > You are welcome. -Corey > Carol Hebert > > > > /sys/class/ipmi/ipmi1/device > /sys/class/ipmi/ipmi1/dev > /sys/class/ipmi/ipmi1/uevent > /sys/class/ipmi/ipmi1/subsystem > /sys/class/ipmi/ipmi0 > /sys/class/ipmi/ipmi0/device > /sys/class/ipmi/ipmi0/dev > /sys/class/ipmi/ipmi0/uevent > /sys/class/ipmi/ipmi0/subsystem > /sys/bus/pci/drivers/ipmi_si > /sys/bus/pci/drivers/ipmi_si/new_id > /sys/bus/pci/drivers/ipmi_si/bind > /sys/bus/pci/drivers/ipmi_si/unbind > /sys/bus/pci/drivers/ipmi_si/module > /sys/bus/platform/drivers/ipmi_si > /sys/bus/platform/drivers/ipmi_si/ipmi_si.1 > /sys/bus/platform/drivers/ipmi_si/ipmi_si.0 > /sys/bus/platform/drivers/ipmi_si/bind > /sys/bus/platform/drivers/ipmi_si/unbind > /sys/bus/platform/drivers/ipmi > /sys/bus/platform/drivers/ipmi/ipmi_bmc.0007.33 > /sys/bus/platform/drivers/ipmi/ipmi_bmc.0007.32 > /sys/bus/platform/drivers/ipmi/bind > /sys/bus/platform/drivers/ipmi/unbind > /sys/bus/platform/devices/ipmi_bmc.0007.33 > /sys/bus/platform/devices/ipmi_si.1 > /sys/bus/platform/devices/ipmi_bmc.0007.32 > /sys/bus/platform/devices/ipmi_si.0 > /sys/devices/platform/ipmi_bmc.0007.33 > /sys/devices/platform/ipmi_bmc.0007.33/ipmi1 > /sys/devices/platform/ipmi_bmc.0007.33/guid > /sys/devices/platform/ipmi_bmc.0007.33/aux_firmware_revision > /sys/devices/platform/ipmi_bmc.0007.33/product_id > /sys/devices/platform/ipmi_bmc.0007.33/manufacturer_id > /sys/devices/platform/ipmi_bmc.0007.33/additional_device_support > /sys/devices/platform/ipmi_bmc.0007.33/ipmi_version > /sys/devices/platform/ipmi_bmc.0007.33/firmware_revision > /sys/devices/platform/ipmi_bmc.0007.33/revision > /sys/devices/platform/ipmi_bmc.0007.33/provides_device_sdrs > /sys/devices/platform/ipmi_bmc.0007.33/device_id > /sys/devices/platform/ipmi_bmc.0007.33/driver > /sys/devices/platform/ipmi_bmc.0007.33/bus > /sys/devices/platform/ipmi_bmc.0007.33/subsystem > /sys/devices/platform/ipmi_bmc.0007.33/modalias > /sys/devices/platform/ipmi_bmc.0007.33/power > /sys/devices/platform/ipmi_bmc.0007.33/power/wakeup > /sys/devices/platform/ipmi_bmc.0007.33/power/state > /sys/devices/platform/ipmi_bmc.0007.33/uevent > /sys/devices/platform/ipmi_si.1 > /sys/devices/platform/ipmi_si.1/ipmi:ipmi1 > /sys/devices/platform/ipmi_si.1/bmc > /sys/devices/platform/ipmi_si.1/driver > /sys/devices/platform/ipmi_si.1/bus > /sys/devices/platform/ipmi_si.1/subsystem > /sys/devices/platform/ipmi_si.1/modalias > /sys/devices/platform/ipmi_si.1/power > /sys/devices/platform/ipmi_si.1/power/wakeup > /sys/devices/platform/ipmi_si.1/power/state > /sys/devices/platform/ipmi_si.1/uevent > /sys/devices/platform/ipmi_bmc.0007.32 > /sys/devices/platform/ipmi_bmc.0007.32/ipmi0 > /sys/devices/platform/ipmi_bmc.0007.32/guid > /sys/devices/platform/ipmi_bmc.0007.32/aux_firmware_revision > /sys/devices/platform/ipmi_bmc.0007.32/product_id > /sys/devices/platform/ipmi_bmc.0007.32/manufacturer_id > /sys/devices/platform/ipmi_bmc.0007.32/additional_device_support > /sys/devices/platform/ipmi_bmc.0007.32/ipmi_version > /sys/devices/platform/ipmi_bmc.0007.32/firmware_revision > /sys/devices/platform/ipmi_bmc.0007.32/revision > /sys/devices/platform/ipmi_bmc.0007.32/provides_device_sdrs > /sys/devices/platform/ipmi_bmc.0007.32/device_id > /sys/devices/platform/ipmi_bmc.0007.32/driver > /sys/devices/platform/ipmi_bmc.0007.32/bus > /sys/devices/platform/ipmi_bmc.0007.32/subsystem > /sys/devices/platform/ipmi_bmc.0007.32/modalias > /sys/devices/platform/ipmi_bmc.0007.32/power > /sys/devices/platform/ipmi_bmc.0007.32/power/wakeup > /sys/devices/platform/ipmi_bmc.0007.32/power/state > /sys/devices/platform/ipmi_bmc.0007.32/uevent > /sys/devices/platform/ipmi_si.0 > /sys/devices/platform/ipmi_si.0/ipmi:ipmi0 > /sys/devices/platform/ipmi_si.0/bmc > /sys/devices/platform/ipmi_si.0/driver > /sys/devices/platform/ipmi_si.0/bus > /sys/devices/platform/ipmi_si.0/subsystem > /sys/devices/platform/ipmi_si.0/modalias > /sys/devices/platform/ipmi_si.0/power > /sys/devices/platform/ipmi_si.0/power/wakeup > /sys/devices/platform/ipmi_si.0/power/state > /sys/devices/platform/ipmi_si.0/uevent > > > # ls -l /dev/ipmi* > crw--- 1 root root 252
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Hi Corey, This latest patch worked great on my 2-node system! :-D I'll try to get some time on a 4-node and 8-node system asap to test it out on them as well. I've listed below how ipmi and the BMCs are now represented in sysfs. Do you still want me to continue working on trying to get some unique BMC device ID/GUID change made in the f/w as well (and in the process find out what we have now ;-}? I'm also working on finding out whether or not it's guaranteed that the BMCs are listed in node order in the SMBIOS table. Thanks very much for your help and for making my day! :-D Carol Hebert /sys/class/ipmi/ipmi1/device /sys/class/ipmi/ipmi1/dev /sys/class/ipmi/ipmi1/uevent /sys/class/ipmi/ipmi1/subsystem /sys/class/ipmi/ipmi0 /sys/class/ipmi/ipmi0/device /sys/class/ipmi/ipmi0/dev /sys/class/ipmi/ipmi0/uevent /sys/class/ipmi/ipmi0/subsystem /sys/bus/pci/drivers/ipmi_si /sys/bus/pci/drivers/ipmi_si/new_id /sys/bus/pci/drivers/ipmi_si/bind /sys/bus/pci/drivers/ipmi_si/unbind /sys/bus/pci/drivers/ipmi_si/module /sys/bus/platform/drivers/ipmi_si /sys/bus/platform/drivers/ipmi_si/ipmi_si.1 /sys/bus/platform/drivers/ipmi_si/ipmi_si.0 /sys/bus/platform/drivers/ipmi_si/bind /sys/bus/platform/drivers/ipmi_si/unbind /sys/bus/platform/drivers/ipmi /sys/bus/platform/drivers/ipmi/ipmi_bmc.0007.33 /sys/bus/platform/drivers/ipmi/ipmi_bmc.0007.32 /sys/bus/platform/drivers/ipmi/bind /sys/bus/platform/drivers/ipmi/unbind /sys/bus/platform/devices/ipmi_bmc.0007.33 /sys/bus/platform/devices/ipmi_si.1 /sys/bus/platform/devices/ipmi_bmc.0007.32 /sys/bus/platform/devices/ipmi_si.0 /sys/devices/platform/ipmi_bmc.0007.33 /sys/devices/platform/ipmi_bmc.0007.33/ipmi1 /sys/devices/platform/ipmi_bmc.0007.33/guid /sys/devices/platform/ipmi_bmc.0007.33/aux_firmware_revision /sys/devices/platform/ipmi_bmc.0007.33/product_id /sys/devices/platform/ipmi_bmc.0007.33/manufacturer_id /sys/devices/platform/ipmi_bmc.0007.33/additional_device_support /sys/devices/platform/ipmi_bmc.0007.33/ipmi_version /sys/devices/platform/ipmi_bmc.0007.33/firmware_revision /sys/devices/platform/ipmi_bmc.0007.33/revision /sys/devices/platform/ipmi_bmc.0007.33/provides_device_sdrs /sys/devices/platform/ipmi_bmc.0007.33/device_id /sys/devices/platform/ipmi_bmc.0007.33/driver /sys/devices/platform/ipmi_bmc.0007.33/bus /sys/devices/platform/ipmi_bmc.0007.33/subsystem /sys/devices/platform/ipmi_bmc.0007.33/modalias /sys/devices/platform/ipmi_bmc.0007.33/power /sys/devices/platform/ipmi_bmc.0007.33/power/wakeup /sys/devices/platform/ipmi_bmc.0007.33/power/state /sys/devices/platform/ipmi_bmc.0007.33/uevent /sys/devices/platform/ipmi_si.1 /sys/devices/platform/ipmi_si.1/ipmi:ipmi1 /sys/devices/platform/ipmi_si.1/bmc /sys/devices/platform/ipmi_si.1/driver /sys/devices/platform/ipmi_si.1/bus /sys/devices/platform/ipmi_si.1/subsystem /sys/devices/platform/ipmi_si.1/modalias /sys/devices/platform/ipmi_si.1/power /sys/devices/platform/ipmi_si.1/power/wakeup /sys/devices/platform/ipmi_si.1/power/state /sys/devices/platform/ipmi_si.1/uevent /sys/devices/platform/ipmi_bmc.0007.32 /sys/devices/platform/ipmi_bmc.0007.32/ipmi0 /sys/devices/platform/ipmi_bmc.0007.32/guid /sys/devices/platform/ipmi_bmc.0007.32/aux_firmware_revision /sys/devices/platform/ipmi_bmc.0007.32/product_id /sys/devices/platform/ipmi_bmc.0007.32/manufacturer_id /sys/devices/platform/ipmi_bmc.0007.32/additional_device_support /sys/devices/platform/ipmi_bmc.0007.32/ipmi_version /sys/devices/platform/ipmi_bmc.0007.32/firmware_revision /sys/devices/platform/ipmi_bmc.0007.32/revision /sys/devices/platform/ipmi_bmc.0007.32/provides_device_sdrs /sys/devices/platform/ipmi_bmc.0007.32/device_id /sys/devices/platform/ipmi_bmc.0007.32/driver /sys/devices/platform/ipmi_bmc.0007.32/bus /sys/devices/platform/ipmi_bmc.0007.32/subsystem /sys/devices/platform/ipmi_bmc.0007.32/modalias /sys/devices/platform/ipmi_bmc.0007.32/power /sys/devices/platform/ipmi_bmc.0007.32/power/wakeup /sys/devices/platform/ipmi_bmc.0007.32/power/state /sys/devices/platform/ipmi_bmc.0007.32/uevent /sys/devices/platform/ipmi_si.0 /sys/devices/platform/ipmi_si.0/ipmi:ipmi0 /sys/devices/platform/ipmi_si.0/bmc /sys/devices/platform/ipmi_si.0/driver /sys/devices/platform/ipmi_si.0/bus /sys/devices/platform/ipmi_si.0/subsystem /sys/devices/platform/ipmi_si.0/modalias /sys/devices/platform/ipmi_si.0/power /sys/devices/platform/ipmi_si.0/power/wakeup /sys/devices/platform/ipmi_si.0/power/state /sys/devices/platform/ipmi_si.0/uevent # ls -l /dev/ipmi* crw--- 1 root root 252, 0 Oct 18 11:52 /dev/ipmi0 crw--- 1 root root 252, 1 Oct 18 11:52 /dev/ipmi1 On Tue, 2006-10-17 at 17:22 -0500, Corey Minyard wrote: > Corey Minyard wrote: > > > >> Please let me know what I can do to help. In the meantime, I'll take a > >> look at the current code and try to figure out why it's still oopsing. > >> > > I thought the oops was fixed. If not, can you send one? > > > > As far as things you can do, I'm not really sure. I don't have en
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Corey Minyard wrote: > >> Please let me know what I can do to help. In the meantime, I'll take a >> look at the current code and try to figure out why it's still oopsing. >> > I thought the oops was fixed. If not, can you send one? > > As far as things you can do, I'm not really sure. I don't have enough > details on how this hardware works to design a solution. This is really > nitty-gritty detail information, like how the nodes map their BMC > addresses and how the SMBIOS table is populated. If the BMCs appeared > in the SMBIOS tables in node order, then the solution is very easy, just > detect and add 1 for each. I could just print a warning at startup when > it detects this and it would probably cover a multitude of future evils :-). > > I thought about this some more, and it's a good idea, I believe to do this. The patch was easy, and I have tested it using a simulator. Note that I found a bug in the product id stuff. It may be that your BMCs don't have a *device* GUID or at least a unique device GUID. (Note that a device GUID is different than a system GUID, and your system may only have a system GUID. The system GUID is supposed to be the same for the entire system, but each BMC is supposed to have its own unique device GUID if it supports that). I was passing a 16-bit value as an unsigned char in the compare routine, so it never matched based on product/device id. So with this patch, either you will get the previous behavior (if your system supports device GUIDs) or all the BMCs will appear to be the a single BMC with multiple interfaces to it (if device GUIDs are not supported). This patch replaces the previous one I sent you. -Corey This patch adds the product id to the driver model platform device name, in addition to the device id. The IPMI speci does not require that individual BMCs in a system have unique devices IDs, but it does require that the product id/device id combination be unique. This also remove a redundant check and cleans up error handling when the sysfs registration fails. It also passes in the sysfs name from the lower-level driver, as the coming IPMI serial driver will need that to link properly from the serial device sysfs directory. Index: linux-2.6.18/drivers/char/ipmi/ipmi_msghandler.c === --- linux-2.6.18.orig/drivers/char/ipmi/ipmi_msghandler.c +++ linux-2.6.18/drivers/char/ipmi/ipmi_msghandler.c @@ -202,6 +202,7 @@ struct ipmi_smi struct bmc_device *bmc; char *my_dev_name; + char *sysfs_name; /* This is the lower-layer's sender routine. */ struct ipmi_smi_handlers *handlers; @@ -1807,13 +1808,12 @@ static int __find_bmc_prod_dev_id(struct struct bmc_device *bmc = dev_get_drvdata(dev); return (bmc->id.product_id == id->product_id - && bmc->id.product_id == id->product_id && bmc->id.device_id == id->device_id); } static struct bmc_device *ipmi_find_bmc_prod_dev_id( struct device_driver *drv, - unsigned char product_id, unsigned char device_id) + unsigned int product_id, unsigned char device_id) { struct prod_dev_id id = { .product_id = product_id, @@ -1930,6 +1930,9 @@ static ssize_t guid_show(struct device * static void remove_files(struct bmc_device *bmc) { + if (!bmc->dev) + return; + device_remove_file(&bmc->dev->dev, &bmc->device_id_attr); device_remove_file(&bmc->dev->dev, @@ -1963,7 +1966,8 @@ cleanup_bmc_device(struct kref *ref) bmc = container_of(ref, struct bmc_device, refcount); remove_files(bmc); - platform_device_unregister(bmc->dev); + if (bmc->dev) + platform_device_unregister(bmc->dev); kfree(bmc); } @@ -1971,7 +1975,11 @@ static void ipmi_bmc_unregister(ipmi_smi { struct bmc_device *bmc = intf->bmc; - sysfs_remove_link(&intf->si_dev->kobj, "bmc"); + if (intf->sysfs_name) { + sysfs_remove_link(&intf->si_dev->kobj, intf->sysfs_name); + kfree(intf->sysfs_name); + intf->sysfs_name = NULL; + } if (intf->my_dev_name) { sysfs_remove_link(&bmc->dev->dev.kobj, intf->my_dev_name); kfree(intf->my_dev_name); @@ -1980,6 +1988,7 @@ static void ipmi_bmc_unregister(ipmi_smi mutex_lock(&ipmidriver_mutex); kref_put(&bmc->refcount, cleanup_bmc_device); + intf->bmc = NULL; mutex_unlock(&ipmidriver_mutex); } @@ -1987,6 +1996,56 @@ static int create_files(struct bmc_devic { int err; + bmc->device_id_attr.attr.name = "device_id"; + bmc->device_id_attr.attr.owner = THIS_MODULE; + bmc->device_id_attr.attr.mode = S_IRUGO; + bmc->device_id_attr.show = device_id_show; + + bmc->provides_dev_sdrs_attr.attr.name = "provides_device_sdrs"; + bmc->provides_dev_sdrs_attr.attr.owner = THIS_MODULE; + bmc->provides_dev_sdrs_attr.attr.mode = S_IRUGO; + bmc->provides_dev_sdrs_attr.show = provides_dev_sdrs_show; + + bmc->revision_attr.attr.name = "revision"; + bmc->revision_attr.attr.owner = THIS_MODULE; + bmc->revision_attr.attr.mode = S_IRUGO; + bmc->revision_attr.show = revision_show; + + bmc->firmware_rev_at
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Carol Hebert wrote: > Hi Corey, > > Sorry I wasn't able to reply sooner. I wanted to discuss this with a > couple of other folks first. > > As per your first solution listed below, I'm going to propose asap that > we modify the f/w to ensure that the device ID is unique for every BMC. > I don't know yet if the proposal will be accepted, but assuming it is, > it would solve the problem for the long term. However, maybe we ought > to supplement that solution with one of the other solutions you listed > (or some combination thereof) to solve the problem in the interim until > the new f/w is released and also in general for users running the older > (current) f/w? > Ok, that's probably good for short term. Do you know if devices with different firmware might mix? That might end up being messy. > Creating an internal mapping table of the different BMCs found at probe > time (and maybe setting the device ID to be an index into the table) > might be useful. Using the GUID (as you imply) would be messy, however, > not saving the GUID/address in some form for later use might make it > difficult to know for sure where each BMC is physically located unless > it's guaranteed that the BMCs are probed in order. Do you know if > there's a guarantee that the BMCs are probed sequentially in a multi-BMC > system? > BMCs are probed in the order that the appear in the SMBIOS/ACPI tables. Note that I have a new patch to allow hot adding/removing BMCs. You might be able to move the problem to userspace and handle it there. > Please let me know what I can do to help. In the meantime, I'll take a > look at the current code and try to figure out why it's still oopsing. > I thought the oops was fixed. If not, can you send one? As far as things you can do, I'm not really sure. I don't have enough details on how this hardware works to design a solution. This is really nitty-gritty detail information, like how the nodes map their BMC addresses and how the SMBIOS table is populated. If the BMCs appeared in the SMBIOS tables in node order, then the solution is very easy, just detect and add 1 for each. I could just print a warning at startup when it detects this and it would probably cover a multitude of future evils :-). -Corey > Thanks for your help, > > Carol Hebert > > >>> >>> >> The "easily digestible form" part is the problem here. You need some >> method to correlate a GUID to something a human being can use to >> identify a system, and it would be nice if it wasn't custom for every >> installed system out there. The address is perhaps better, but I'm >> going to have to have some way to translate the addresses to system >> numbers, and it's going to have to be OEM for this type of hardware, and >> the addresses are not available at the level this is happening, this >> code is generic for all interface types. >> >> So what we can do, in my order of preference :-) : >> >>1. Modify the IPMI firmware to set the device id to a unique number >> for every BMC in the system. It would be really nice if this was >> done in a way that the device ids could be correlated with >> physical systems. This will work with the IPMI driver as-is, and >> I checked and udev translations can be done as-is, too, I believe. >>2. Use some OEM IPMI command that could query the physical system >> number, if something like this exists. >>3. Create an OEM handler to use the GUID to map to physical systems. >> I'm going to need some help with this, I have no idea how to do >> this. Looking at the GUID format (Table 20-10 in the IPMI 2.0 >> spec), I don't see any way to do this. The node field, BTW, is >> supposed to be the 802.x MAC address. >>4. Use the I/O address. This introduces a lot of headaches into the >> structure of the IPMI driver as the address has to be propagated >> from the interface-specific handler to the generic code, and it >> introduces an OEM handler. And I'll need some way to map the I/O >> addresses to physical systems. >> >> Any more ideas? >> >> -Corey >> >>> Regarding the ipmi device support currently being fixed at a max of 4, >>> the largest multi-node configuration we currently have is 8 so we would >>> need to have the table size bumped up to at least 8. However, for >>> future support, it might be useful to increase it even more (12, 16?). >>> >>> >> I'll probably just make it a list and get rid of the table so it can be >> arbitrary counts. >> >>> Finally, I don't believe dynamic node plugging will generally be an >>> issue for my system since the nodes are merged at boot time rather than >>> being dynamically added and/or removed. >>> >>> >> So the time is not here yet, but I'm sure it's coming someday :) I can >> wait on this one, then, but I decided it would be pretty easy to do >> through the hotplug subsystem. >> >> -Corey >> >>> Thanks very much,
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Hi Corey, Sorry I wasn't able to reply sooner. I wanted to discuss this with a couple of other folks first. As per your first solution listed below, I'm going to propose asap that we modify the f/w to ensure that the device ID is unique for every BMC. I don't know yet if the proposal will be accepted, but assuming it is, it would solve the problem for the long term. However, maybe we ought to supplement that solution with one of the other solutions you listed (or some combination thereof) to solve the problem in the interim until the new f/w is released and also in general for users running the older (current) f/w? Creating an internal mapping table of the different BMCs found at probe time (and maybe setting the device ID to be an index into the table) might be useful. Using the GUID (as you imply) would be messy, however, not saving the GUID/address in some form for later use might make it difficult to know for sure where each BMC is physically located unless it's guaranteed that the BMCs are probed in order. Do you know if there's a guarantee that the BMCs are probed sequentially in a multi-BMC system? Please let me know what I can do to help. In the meantime, I'll take a look at the current code and try to figure out why it's still oopsing. Thanks for your help, Carol Hebert > > > The "easily digestible form" part is the problem here. You need some > method to correlate a GUID to something a human being can use to > identify a system, and it would be nice if it wasn't custom for every > installed system out there. The address is perhaps better, but I'm > going to have to have some way to translate the addresses to system > numbers, and it's going to have to be OEM for this type of hardware, and > the addresses are not available at the level this is happening, this > code is generic for all interface types. > > So what we can do, in my order of preference :-) : > >1. Modify the IPMI firmware to set the device id to a unique number > for every BMC in the system. It would be really nice if this was > done in a way that the device ids could be correlated with > physical systems. This will work with the IPMI driver as-is, and > I checked and udev translations can be done as-is, too, I believe. >2. Use some OEM IPMI command that could query the physical system > number, if something like this exists. >3. Create an OEM handler to use the GUID to map to physical systems. > I'm going to need some help with this, I have no idea how to do > this. Looking at the GUID format (Table 20-10 in the IPMI 2.0 > spec), I don't see any way to do this. The node field, BTW, is > supposed to be the 802.x MAC address. >4. Use the I/O address. This introduces a lot of headaches into the > structure of the IPMI driver as the address has to be propagated > from the interface-specific handler to the generic code, and it > introduces an OEM handler. And I'll need some way to map the I/O > addresses to physical systems. > > Any more ideas? > > -Corey > > Regarding the ipmi device support currently being fixed at a max of 4, > > the largest multi-node configuration we currently have is 8 so we would > > need to have the table size bumped up to at least 8. However, for > > future support, it might be useful to increase it even more (12, 16?). > > > I'll probably just make it a list and get rid of the table so it can be > arbitrary counts. > > Finally, I don't believe dynamic node plugging will generally be an > > issue for my system since the nodes are merged at boot time rather than > > being dynamically added and/or removed. > > > So the time is not here yet, but I'm sure it's coming someday :) I can > wait on this one, then, but I decided it would be pretty easy to do > through the hotplug subsystem. > > -Corey > > Thanks very much, > > > > Carol Hebert > > > > On Wed, 2006-10-11 at 10:25 -0500, Corey Minyard wrote: > > > >> Now the driver is doing exactly what it is supposed to do, but now that > >> may not be what we want. I'm not sure of the configuration of this > >> system, but the information below gives me some clues. Here's my guess > >> on the system: > >> > >> This is a NUMA system with hot-plug CPU boards. Each board has an IPMI > >> controller on it. The BIOS maps the I/O address and SMBIOS tables for > >> the IPMI controller to different I/O locations based upon the slot the > >> board is in. There are a number of problems beyond this one for a > >> configuration of this nature. I'll address those later. > >> > >> In response to your question, I believe this is exactly what the Device > >> ID in IPMI is intended for. Each board in the system should have a > >> unique device id based upon the slot it is in. Say you have an > >> application that monitors the CPU temperature of all the CPUs. If a > >> temperature goes out of range, you want to know which board that CPU is > >> on
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Carol Hebert wrote: > Hi, > > I believe your assessment of my x460 dual-node system configuration is > correct with the exception of maybe changing the word "slot" to "system" > since the nodes are joined by scalability cables rather than being > connected via a common backplane. > > Regarding the uniqueness of the Device ID, I think you mentioned in an > earlier email that the spec was a bit contradictory on the topic. I > took a look at the spec and agree that it is not at all clear whether > the Device ID should be unique for all controllers or only for ones that > support a different set of application commands/OEM fields. In one > paragraph, it states that: > > "Controllers that implement identical sets of applications (sic) > commands can have the same Device ID in a given system. Thus a > 'standardized' controller could be produced where multiple instances of > the controller are used in a system, and all have the same Device ID > value. The controllers would still be differentiable by their > address..." > > and in the *immediately following* paragraph, it states > > "A controller can optionally use the Device ID as an 'instance' > identifier if more than one controller of that kind is used in the > system." (It then goes on to say that the GUID, however, is the > preferred method of uniquely identifying controllers.) > > Sheesh. :-} > > In checking out the dmidecode data, I verified that the addresses of the > controllers on the multi-node system are unique and available there. So > both the GUID and the address are unique for the controllers on the > multi-node system whereas the Device ID is not. Can we use the GUID > (maybe in some more easily digestible form) or the address instead of > the Device ID? It seems like the only thing that's clear from the spec > is that the Device ID's uniqueness is something we can't count on. > The "easily digestible form" part is the problem here. You need some method to correlate a GUID to something a human being can use to identify a system, and it would be nice if it wasn't custom for every installed system out there. The address is perhaps better, but I'm going to have to have some way to translate the addresses to system numbers, and it's going to have to be OEM for this type of hardware, and the addresses are not available at the level this is happening, this code is generic for all interface types. So what we can do, in my order of preference :-) : 1. Modify the IPMI firmware to set the device id to a unique number for every BMC in the system. It would be really nice if this was done in a way that the device ids could be correlated with physical systems. This will work with the IPMI driver as-is, and I checked and udev translations can be done as-is, too, I believe. 2. Use some OEM IPMI command that could query the physical system number, if something like this exists. 3. Create an OEM handler to use the GUID to map to physical systems. I'm going to need some help with this, I have no idea how to do this. Looking at the GUID format (Table 20-10 in the IPMI 2.0 spec), I don't see any way to do this. The node field, BTW, is supposed to be the 802.x MAC address. 4. Use the I/O address. This introduces a lot of headaches into the structure of the IPMI driver as the address has to be propagated from the interface-specific handler to the generic code, and it introduces an OEM handler. And I'll need some way to map the I/O addresses to physical systems. Any more ideas? -Corey > Regarding the ipmi device support currently being fixed at a max of 4, > the largest multi-node configuration we currently have is 8 so we would > need to have the table size bumped up to at least 8. However, for > future support, it might be useful to increase it even more (12, 16?). > I'll probably just make it a list and get rid of the table so it can be arbitrary counts. > Finally, I don't believe dynamic node plugging will generally be an > issue for my system since the nodes are merged at boot time rather than > being dynamically added and/or removed. > So the time is not here yet, but I'm sure it's coming someday :) I can wait on this one, then, but I decided it would be pretty easy to do through the hotplug subsystem. -Corey > Thanks very much, > > Carol Hebert > > On Wed, 2006-10-11 at 10:25 -0500, Corey Minyard wrote: > >> Now the driver is doing exactly what it is supposed to do, but now that >> may not be what we want. I'm not sure of the configuration of this >> system, but the information below gives me some clues. Here's my guess >> on the system: >> >> This is a NUMA system with hot-plug CPU boards. Each board has an IPMI >> controller on it. The BIOS maps the I/O address and SMBIOS tables for >> the IPMI controller to different I/O locations based upon the slot the >> board is in. There are a number of problems beyond this one for a >>
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Hi, I believe your assessment of my x460 dual-node system configuration is correct with the exception of maybe changing the word "slot" to "system" since the nodes are joined by scalability cables rather than being connected via a common backplane. Regarding the uniqueness of the Device ID, I think you mentioned in an earlier email that the spec was a bit contradictory on the topic. I took a look at the spec and agree that it is not at all clear whether the Device ID should be unique for all controllers or only for ones that support a different set of application commands/OEM fields. In one paragraph, it states that: "Controllers that implement identical sets of applications (sic) commands can have the same Device ID in a given system. Thus a 'standardized' controller could be produced where multiple instances of the controller are used in a system, and all have the same Device ID value. The controllers would still be differentiable by their address..." and in the *immediately following* paragraph, it states "A controller can optionally use the Device ID as an 'instance' identifier if more than one controller of that kind is used in the system." (It then goes on to say that the GUID, however, is the preferred method of uniquely identifying controllers.) Sheesh. :-} In checking out the dmidecode data, I verified that the addresses of the controllers on the multi-node system are unique and available there. So both the GUID and the address are unique for the controllers on the multi-node system whereas the Device ID is not. Can we use the GUID (maybe in some more easily digestible form) or the address instead of the Device ID? It seems like the only thing that's clear from the spec is that the Device ID's uniqueness is something we can't count on. Regarding the ipmi device support currently being fixed at a max of 4, the largest multi-node configuration we currently have is 8 so we would need to have the table size bumped up to at least 8. However, for future support, it might be useful to increase it even more (12, 16?). Finally, I don't believe dynamic node plugging will generally be an issue for my system since the nodes are merged at boot time rather than being dynamically added and/or removed. Thanks very much, Carol Hebert On Wed, 2006-10-11 at 10:25 -0500, Corey Minyard wrote: > Now the driver is doing exactly what it is supposed to do, but now that > may not be what we want. I'm not sure of the configuration of this > system, but the information below gives me some clues. Here's my guess > on the system: > > This is a NUMA system with hot-plug CPU boards. Each board has an IPMI > controller on it. The BIOS maps the I/O address and SMBIOS tables for > the IPMI controller to different I/O locations based upon the slot the > board is in. There are a number of problems beyond this one for a > configuration of this nature. I'll address those later. > > In response to your question, I believe this is exactly what the Device > ID in IPMI is intended for. Each board in the system should have a > unique device id based upon the slot it is in. Say you have an > application that monitors the CPU temperature of all the CPUs. If a > temperature goes out of range, you want to know which board that CPU is > on. And the Device ID can tell you that. The IPMI device number that > you suggest using are arbitrary, especially in a hot-plug system where > devices can come and go dynamically. > > In addition, you would probably want to be able to do udev mappings so > that the same slots appear as the same device names (slot 1 is > /dev/ipmi1, slot 2 is /dev/ipmi2, etc.). The driver needs to be able to > give udev information about the devices, and the Product ID/Device ID is > really all it's got. > > Now for the other problems: > >1. The IPMI driver doesn't current support an arbitrary number of > devices. It has a fixed table of four. I can fix this fairly > easily, though. I wasn't really expecting a system to be designed > like this. >2. The IPMI driver has no way to handle dynamic node plugging. I > don't know of a standard way to tell the IPMI driver: "Hey, you > have a new controller here". The driver should support adding new > devices dynamically, but I need some way to know the device is > there, or that it is going away. >3. I don't think the IPMI driver provides a way for sysfs to report > the information that udev needs to do the udev mappings properly > As always with sysfs, this is probably easy once you spend 2 days > figuring out what to do. > > Am I on the right track here? - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Now the driver is doing exactly what it is supposed to do, but now that may not be what we want. I'm not sure of the configuration of this system, but the information below gives me some clues. Here's my guess on the system: This is a NUMA system with hot-plug CPU boards. Each board has an IPMI controller on it. The BIOS maps the I/O address and SMBIOS tables for the IPMI controller to different I/O locations based upon the slot the board is in. There are a number of problems beyond this one for a configuration of this nature. I'll address those later. In response to your question, I believe this is exactly what the Device ID in IPMI is intended for. Each board in the system should have a unique device id based upon the slot it is in. Say you have an application that monitors the CPU temperature of all the CPUs. If a temperature goes out of range, you want to know which board that CPU is on. And the Device ID can tell you that. The IPMI device number that you suggest using are arbitrary, especially in a hot-plug system where devices can come and go dynamically. In addition, you would probably want to be able to do udev mappings so that the same slots appear as the same device names (slot 1 is /dev/ipmi1, slot 2 is /dev/ipmi2, etc.). The driver needs to be able to give udev information about the devices, and the Product ID/Device ID is really all it's got. Now for the other problems: 1. The IPMI driver doesn't current support an arbitrary number of devices. It has a fixed table of four. I can fix this fairly easily, though. I wasn't really expecting a system to be designed like this. 2. The IPMI driver has no way to handle dynamic node plugging. I don't know of a standard way to tell the IPMI driver: "Hey, you have a new controller here". The driver should support adding new devices dynamically, but I need some way to know the device is there, or that it is going away. 3. I don't think the IPMI driver provides a way for sysfs to report the information that udev needs to do the udev mappings properly As always with sysfs, this is probably easy once you spend 2 days figuring out what to do. Am I on the right track here? -Corey Carol Hebert wrote: > Hi Corey, > > I'm still having problems with the new patches due to the device ID and > the Product ID being the same on each of the nodes (still have > segfault/oops). The dual node system is really two separate nodes that > are joined at will (via RSA setup). Since each began life (and can > resume life at any time) as a standalone system, isn't it reasonable > that they could have the same BMC Product and Device IDs? If not, do > you think this is something that could/should be changed/set in the BIOS > for each BMC on multi-node systems? > > Alternately, would it be possible to differentiate between the two BMCs > for sysfs file naming purposes by using the value of intf->intf_num in > ipmi_bmc_register()? I believe that's pretty similar to what's > currently done to differentiate between the ipmi.0 and ipmi.1 > interfaces. As an example, I tacked the intf_num onto the product id in > ipmi_bmc_register() (your and Jeff's patched version of the > ipmi_msghandler.c file): > > } else { > - char name[14]; > + char name[16]; > snprintf(name, sizeof(name), > - "ipmi_bmc.%4.4x", bmc->id.product_id); > + "ipmi_bmc.%4.4x%d", bmc->id.product_id, > intf->intf_num); > > and the modules loaded fine. The file names become: ipmi_bmc.00070.32 > and ipmi_bmc.00071.32 (see debug trace below). I suspect I may be > grossly oversimplifying the feasibility/usability/implementation of this > solution but at first glance/touch test, it appears to work so I thought > it might be good to discuss it. > > Anyway, thanks again for your help. Please let me know what you'd like > me to try next. Also, I can probably get some time on a 4-node and/or > an 8-node system so we can really stress the solution once we've settled > on a fix. > > Thanks much, > > Carol Hebert > > - > > kobject ipmi_msghandler: registering. parent: , set: module > kobject_uevent > fill_kobj_path: path = '/module/ipmi_msghandler' > kobject ipmi: registering. parent: , set: drivers > kobject_uevent > fill_kobj_path: path = '/bus/platform/drivers/ipmi' > ipmi message handler version 39.0 > kobject ipmi_devintf: registering. parent: , set: module > kobject_uevent > fill_kobj_path: path = '/module/ipmi_devintf' > ipmi device interface > subsystem ipmi: registering > kobject ipmi: registering. parent: , set: class > kobject ipmi_si: registering. parent: , set: module > kobject_uevent > fill_kobj_path: path = '/module/ipmi_si' > kobject ipmi_si: registering. parent: , set: drivers > kobject_uevent > fill_kobj_path: path = '/bus/platform/drivers/ipmi_si' > IPMI System Interface driver. > ipmi_si: Trying SMBIOS-specified KCS
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Hi Corey, I'm still having problems with the new patches due to the device ID and the Product ID being the same on each of the nodes (still have segfault/oops). The dual node system is really two separate nodes that are joined at will (via RSA setup). Since each began life (and can resume life at any time) as a standalone system, isn't it reasonable that they could have the same BMC Product and Device IDs? If not, do you think this is something that could/should be changed/set in the BIOS for each BMC on multi-node systems? Alternately, would it be possible to differentiate between the two BMCs for sysfs file naming purposes by using the value of intf->intf_num in ipmi_bmc_register()? I believe that's pretty similar to what's currently done to differentiate between the ipmi.0 and ipmi.1 interfaces. As an example, I tacked the intf_num onto the product id in ipmi_bmc_register() (your and Jeff's patched version of the ipmi_msghandler.c file): } else { - char name[14]; + char name[16]; snprintf(name, sizeof(name), - "ipmi_bmc.%4.4x", bmc->id.product_id); + "ipmi_bmc.%4.4x%d", bmc->id.product_id, intf->intf_num); and the modules loaded fine. The file names become: ipmi_bmc.00070.32 and ipmi_bmc.00071.32 (see debug trace below). I suspect I may be grossly oversimplifying the feasibility/usability/implementation of this solution but at first glance/touch test, it appears to work so I thought it might be good to discuss it. Anyway, thanks again for your help. Please let me know what you'd like me to try next. Also, I can probably get some time on a 4-node and/or an 8-node system so we can really stress the solution once we've settled on a fix. Thanks much, Carol Hebert - kobject ipmi_msghandler: registering. parent: , set: module kobject_uevent fill_kobj_path: path = '/module/ipmi_msghandler' kobject ipmi: registering. parent: , set: drivers kobject_uevent fill_kobj_path: path = '/bus/platform/drivers/ipmi' ipmi message handler version 39.0 kobject ipmi_devintf: registering. parent: , set: module kobject_uevent fill_kobj_path: path = '/module/ipmi_devintf' ipmi device interface subsystem ipmi: registering kobject ipmi: registering. parent: , set: class kobject ipmi_si: registering. parent: , set: module kobject_uevent fill_kobj_path: path = '/module/ipmi_si' kobject ipmi_si: registering. parent: , set: drivers kobject_uevent fill_kobj_path: path = '/bus/platform/drivers/ipmi_si' IPMI System Interface driver. ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0x90a8, slave address 0x20, irq 0 kobject ipmi_si.0: registering. parent: platform, set: devices PM: Adding info for platform:ipmi_si.0 kobject_uevent fill_kobj_path: path = '/devices/platform/ipmi_si.0' CAH: ipmi: NEW BMC: name = ipmi_bmc.00070; intf_num = 0 kobject ipmi_bmc.00070.32: registering. parent: platform, set: devices PM: Adding info for platform:ipmi_bmc.00070.32 kobject_uevent fill_kobj_path: path = '/devices/platform/ipmi_bmc.00070.32' ipmi: Found new BMC (man_id: 0x02, prod_id: 0x0007, dev_id: 0x20) kobject ipmi0: registering. parent: ipmi, set: class_obj kobject_uevent fill_kobj_path: path = '/class/ipmi/ipmi0' fill_kobj_path: path = '/devices/platform/ipmi_si.0' IPMI KCS interface initialized ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca8, slave address 0x20, irq 0 kobject ipmi_si.1: registering. parent: platform, set: devices PM: Adding info for platform:ipmi_si.1 kobject_uevent fill_kobj_path: path = '/devices/platform/ipmi_si.1' CAH: ipmi: NEW BMC: name = ipmi_bmc.00071; intf_num = 1 kobject ipmi_bmc.00071.32: registering. parent: platform, set: devices PM: Adding info for platform:ipmi_bmc.00071.32 kobject_uevent fill_kobj_path: path = '/devices/platform/ipmi_bmc.00071.32' ipmi: Found new BMC (man_id: 0x02, prod_id: 0x0007, dev_id: 0x20) kobject ipmi1: registering. parent: ipmi, set: class_obj kobject_uevent fill_kobj_path: path = '/class/ipmi/ipmi1' fill_kobj_path: path = '/devices/platform/ipmi_si.1' IPMI KCS interface initialized kobject ipmi_si: registering. parent: , set: drivers kobject_uevent fill_kobj_path: path = '/bus/pci/drivers/ipmi_si' On Tue, 2006-10-10 at 10:49 -0500, Corey Minyard wrote: > Sorry, I messed up the error recovery in the previous patch. This one > should fix it; I've simulated this and it works fine. I've also > included a patch from Jeff Garzik that does some more cleanup. Jeff's > patch must be applied first; it is named "ipmi-handle-sysfs-errors.patch". > > I'm still not sure what to do about the naming problem, though. I am > assuming you the two devices have different GUIDs, otherwise they would > should up as the same BMC. I'd prefer to not use the GUID, as it is > huge and meaningless to humans and applications. > > I re-read the section in the spec again, and I really believe it is the > intent that different BMCs on the
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Hi Corey, Thanks very much for the patch. :-) I built it and ran it on my system and it works a bit better than the original but it still has some problems. I'm attaching the dmesg output below (with a bit of debug turned on in it). With the patch, the modprobe appears to create one of the two ipmi device nodes (ipmi0) expected for the dual-node system although modprobe of ipmi_si appears to hang Could you please take a look at the error messages below and see if you can spot the problem? Thanks much again, Carol Hebert - kobject ipmi_msghandler: registering. parent: , set: module kobject_uevent fill_kobj_path: path = '/module/ipmi_msghandler' kobject ipmi: registering. parent: , set: drivers kobject_uevent fill_kobj_path: path = '/bus/platform/drivers/ipmi' ipmi message handler version 39.0 kobject ipmi_devintf: registering. parent: , set: module kobject_uevent fill_kobj_path: path = '/module/ipmi_devintf' ipmi device interface subsystem ipmi: registering kobject ipmi: registering. parent: , set: class kobject ipmi_si: registering. parent: , set: module kobject_uevent fill_kobj_path: path = '/module/ipmi_si' kobject ipmi_si: registering. parent: , set: drivers kobject_uevent fill_kobj_path: path = '/bus/platform/drivers/ipmi_si' IPMI System Interface driver. ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0x90a8, slave address 0x20, irq 0 kobject ipmi_si.0: registering. parent: platform, set: devices PM: Adding info for platform:ipmi_si.0 kobject_uevent fill_kobj_path: path = '/devices/platform/ipmi_si.0' kobject ipmi_bmc.0007.32: registering. parent: platform, set: devices PM: Adding info for platform:ipmi_bmc.0007.32 kobject_uevent fill_kobj_path: path = '/devices/platform/ipmi_bmc.0007.32' ipmi: Found new BMC (man_id: 0x02, prod_id: 0x0007, dev_id: 0x20) kobject ipmi0: registering. parent: ipmi, set: class_obj kobject_uevent fill_kobj_path: path = '/class/ipmi/ipmi0' fill_kobj_path: path = '/devices/platform/ipmi_si.0' IPMI KCS interface initialized ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca8, slave address 0x20, irq 0 kobject ipmi_si.1: registering. parent: platform, set: devices PM: Adding info for platform:ipmi_si.1 kobject_uevent fill_kobj_path: path = '/devices/platform/ipmi_si.1' kobject ipmi_bmc.0007.32: registering. parent: platform, set: devices kobject_add failed for ipmi_bmc.0007.32 with -EEXIST, don't try to register things with the same name in the same directory. [] show_trace_log_lvl+0x58/0x16a [] show_trace+0xd/0x10 [] dump_stack+0x19/0x1b [] kobject_add+0x186/0x1ac [] device_add+0x7a/0x2de [] platform_device_add+0xde/0x10e [] platform_device_register+0x15/0x18 [] ipmi_register_smi+0x563/0x987 [ipmi_msghandler] [] try_smi_init+0x3ff/0x5a7 [ipmi_si] [] init_ipmi_si+0x40f/0x6db [ipmi_si] [] sys_init_module+0x16ad/0x1856 [] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb Leftover inexact backtrace: [] show_trace+0xd/0x10 [] dump_stack+0x19/0x1b [] kobject_add+0x186/0x1ac [] device_add+0x7a/0x2de [] platform_device_add+0xde/0x10e [] platform_device_register+0x15/0x18 [] ipmi_register_smi+0x563/0x987 [ipmi_msghandler] [] try_smi_init+0x3ff/0x5a7 [ipmi_si] [] init_ipmi_si+0x40f/0x6db [ipmi_si] [] sys_init_module+0x16ad/0x1856 [] syscall_call+0x7/0xb kobject ipmi_bmc.0007.32: cleaning up ipmi_msghandler: Unable to register bmc device: -17 ipmi_si: Unable to register device: error -17 BUG: unable to handle kernel paging request at virtual address 6b6b6c73 printing eip: c04ab7f4 *pde = Oops: [#1] SMP last sysfs file: /class/drm/card0/dev Modules linked in: ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) radeon(U) drm(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ipv6(U) acpi_cpufreq(U) video(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) sg(U) i2c_piix4(U) ide_cd(U) i2c_core(U) aacraid(U) tg3(U) cdrom(U) serio_raw(U) pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) aic94xx(U) libsas(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) CPU:14 EIP:0060:[]Not tainted VLI EFLAGS: 00010212 (2.6.18-ipmipatch #3) EIP is at sysfs_remove_link+0x1/0xd eax: 6b6b6c43 ebx: f54c876c ecx: c042d7c9 edx: f9781b20 esi: 6b6b6b6b edi: f54c876c ebp: f4894e58 esp: f4894e48 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 5643, ti=f4894000 task=f6c5e030 task.ti=f4894000) Stack: f4894e58 f977fec3 ffef f4894e6c f9780564 ffef dfc0db38 ffef f4894e84 f978ef34 0118f8be 0ca8 0004 f4894eac f978f99e 0004 c302d700 010020ac 0ca8 f9797500 f9797500 Call Trace: [] ipmi_bmc_unregister+0x20/0x6e [ipmi_msghandler] [] ipmi_unregister_smi+0xf/0xc3 [ipmi_msghandler] [] try_smi_init+0x4d5/0x5a7 [ipmi_si] [] init_ipmi_si+0x40f
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Hopefully the attached patch will fix the problem and clean up the error handling in this failure case. -Corey Carol Hebert wrote: > Hi Corey, > > I believe I may have found a problem with the ipmi driver v39 in the > 2.6.18 kernel when loaded on multi-node systems (in my particular case, > an dual-node x460 with two BMCs). At first glance, it appears the > problem may be in the sysfs code added last January -- it looks like it > may not be handling the multiple BMCs correctly. The result is that > the ipmi_si module won't load and the ipmi device nodes don't get > created. > > I'm only starting to debug the issue but wanted to let you know what > I've seen asap in case someone's already spotted this problem but I > missed seeing a patch and also because I'm not a sysfs expert and I > don't know what the original intent was for how to present multiple BMCs > (from multi-node systems) in the sysfs. > > I'm pasting the stack backtrace below. Please let me know if you have > any suggestions or questions. > > Thanks much, > > Carol Hebert > > > ipmi message handler version 39.0 > IPMI System Interface driver. > ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address > 0x90a8, slave address 0x20, irq 0 > PM: Adding info for platform:ipmi_si.0 > PM: Adding info for platform:ipmi_bmc.32 > ipmi: Found new BMC (man_id: 0x02, prod_id: 0x0007, dev_id: 0x20) > IPMI KCS interface initialized > ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca8, > slave address 0x20, irq 0 > PM: Adding info for platform:ipmi_si.1 > kobject_add failed for ipmi_bmc.32 with -EEXIST, don't try to register > things with the same name in the same directory. > [] show_trace_log_lvl+0x58/0x16a > [] show_trace+0xd/0x10 > [] dump_stack+0x19/0x1b > [] kobject_add+0x14b/0x171 > [] device_add+0x7a/0x2de > [] platform_device_add+0xde/0x10e > [] platform_device_register+0x15/0x18 > [] ipmi_register_smi+0x538/0x94a [ipmi_msghandler] > [] try_smi_init+0x3ff/0x5a7 [ipmi_si] > [] init_ipmi_si+0x40f/0x6db [ipmi_si] > [] sys_init_module+0x16ad/0x1856 > [] syscall_call+0x7/0xb > DWARF2 unwinder stuck at syscall_call+0x7/0xb > Leftover inexact backtrace: > [] show_trace+0xd/0x10 > [] dump_stack+0x19/0x1b > [] kobject_add+0x14b/0x171 > [] device_add+0x7a/0x2de > [] platform_device_add+0xde/0x10e > [] platform_device_register+0x15/0x18 > [] ipmi_register_smi+0x538/0x94a [ipmi_msghandler] > [] try_smi_init+0x3ff/0x5a7 [ipmi_si] > [] init_ipmi_si+0x40f/0x6db [ipmi_si] > [] sys_init_module+0x16ad/0x1856 > [] syscall_call+0x7/0xb > ipmi_msghandler: Unable to register bmc device: -17 > ipmi_si: Unable to register device: error -17 > BUG: unable to handle kernel paging request at virtual address 6b6b6c73 > printing eip: > c04aa1d4 > *pde = 6b6b6b6b > Oops: [#1] > SMP > last sysfs file: /class/drm/card0/dev > Modules linked in: ipmi_si ipmi_msghandler radeon drm autofs4 hidp > rfcomm l2cap bluetooth sunrpc ipv6 acpi_cpufreq video sbs i2c_ec button > battery asus_acpi ac parport_pc lp parport joydev sg pcspkr tg3 aacraid > i2c_piix4 i2c_core ide_cd cdrom serio_raw dm_snapshot dm_zero dm_mirror > dm_mod aic94xx libsas scsi_transport_sas sd_mod scsi_mod ext3 jbd > ehci_hcd ohci_hcd uhci_hcd > CPU:8 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010212 (2.6.18-1.2702.el5PAE #1) > EIP is at sysfs_remove_link+0x1/0xd > eax: 6b6b6c43 ebx: e722ad78 ecx: c042dc05 edx: f8b0aad8 > esi: 6b6b6b6b edi: e722ad78 ebp: e7152e58 esp: e7152e48 > ds: 007b es: 007b ss: 0068 > Process modprobe (pid: 20599, ti=e7152000 task=f72b0030 > task.ti=e7152000) > Stack: e7152e58 f8b08ebf ffef e7152e6c f8b09559 ffef > eeb70248 >ffef e7152e84 f980bf34 0118c8be 0ca8 0004 > e7152eac >f980c99e 0004 d1c2d700 010020ac 0ca8 f9814480 > f9814480 > Call Trace: > [] ipmi_bmc_unregister+0x1c/0x63 [ipmi_msghandler] > [] ipmi_unregister_smi+0xf/0xc3 [ipmi_msghandler] > [] try_smi_init+0x4d5/0x5a7 [ipmi_si] > [] init_ipmi_si+0x40f/0x6db [ipmi_si] > [] sys_init_module+0x16ad/0x1856 > [] syscall_call+0x7/0xb > DWARF2 unwinder stuck at syscall_call+0x7/0xb > Leftover inexact backtrace: > [] show_stack_log_lvl+0x8a/0x95 > [] show_registers+0x12d/0x19a > [] die+0x190/0x293 > [] do_page_fault+0x4e8/0x5ba > [] error_code+0x39/0x40 > [] ipmi_unregister_smi+0xf/0xc3 [ipmi_msghandler] > [] try_smi_init+0x4d5/0x5a7 [ipmi_si] > [] init_ipmi_si+0x40f/0x6db [ipmi_si] > [] sys_init_module+0x16ad/0x1856 > [] syscall_call+0x7/0xb > Code: f1 f8 ff 8b 45 f0 e8 06 d0 03 00 8b 45 ec e8 fe cf 03 00 8b 55 e4 > 8b 4d e0 8b 41 1c 89 54 81 20 83 c4 14 31 c0 5b 5e 5f 5d c3 55 <8b> 40 > 30 89 e5 e8 d0 e4 ff ff 5d c3 55 89 e5 57 56 89 ce 53 83 > EIP: [] sysfs_remove_link+0x1/0xd SS:ESP 0068:e7152e48 > > This patch adds the product id to the driver model platform device name, in addition to the device id. The IPMI speci does not require that individual BMCs in
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
The basic problem is that platform_device_alloc() is being called with the device id, but not the product id as part of the name. According to the spec, The combo of the two is required to be unique on a machine. But the device id is the same on both BMCs, it appears. Carol, can you confirm that the product id's are different? They are printed at driver load time. I'll get a patch soon. -Corey Yani Ioannou wrote: > Hi Carol, > > On 10/6/06, Carol Hebert <[EMAIL PROTECTED]> wrote: > >> I believe I may have found a problem with the ipmi driver v39 in the >> 2.6.18 kernel when loaded on multi-node systems (in my particular case, >> an dual-node x460 with two BMCs). At first glance, it appears the >> problem may be in the sysfs code added last January -- it looks like it >> may not be handling the multiple BMCs correctly. The result is that >> the ipmi_si module won't load and the ipmi device nodes don't get >> created. >> > > I guess I shouldn't be suprised - its very hard to find someone with > access to a system with multiple BMCs (not just multiple interfaces) > to who is willing to test this out with, I only have access to a old > HP workstation with a rudimentary IPMI 1.0 card myself. > > >> I'm only starting to debug the issue but wanted to let you know what >> I've seen asap in case someone's already spotted this problem but I >> missed seeing a patch and also because I'm not a sysfs expert and I >> don't know what the original intent was for how to present multiple BMCs >> (from multi-node systems) in the sysfs. >> > > I did write the code to handle multiple BMCs, but it looks like I > overlooked something, from your backtrace at first glance it appears > that some sysfs file is being duplicated in the same directory. Could > you perhaps turn on sysfs/kobject debugging in the kernel debugging > options? > > Thanks, > Yani > > - > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > ___ > Openipmi-developer mailing list > Openipmi-developer@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/openipmi-developer > - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer
Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel
Hi Carol, On 10/6/06, Carol Hebert <[EMAIL PROTECTED]> wrote: > I believe I may have found a problem with the ipmi driver v39 in the > 2.6.18 kernel when loaded on multi-node systems (in my particular case, > an dual-node x460 with two BMCs). At first glance, it appears the > problem may be in the sysfs code added last January -- it looks like it > may not be handling the multiple BMCs correctly. The result is that > the ipmi_si module won't load and the ipmi device nodes don't get > created. I guess I shouldn't be suprised - its very hard to find someone with access to a system with multiple BMCs (not just multiple interfaces) to who is willing to test this out with, I only have access to a old HP workstation with a rudimentary IPMI 1.0 card myself. > I'm only starting to debug the issue but wanted to let you know what > I've seen asap in case someone's already spotted this problem but I > missed seeing a patch and also because I'm not a sysfs expert and I > don't know what the original intent was for how to present multiple BMCs > (from multi-node systems) in the sysfs. I did write the code to handle multiple BMCs, but it looks like I overlooked something, from your backtrace at first glance it appears that some sysfs file is being duplicated in the same directory. Could you perhaps turn on sysfs/kobject debugging in the kernel debugging options? Thanks, Yani - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ___ Openipmi-developer mailing list Openipmi-developer@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openipmi-developer