Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-11-17 Thread Carol Hebert
On Fri, 2006-11-17 at 09:53 -0600, Corey Minyard wrote:

> > oopses (appended below).  Does this patch require one or more of the
> > other patches in the 39.1 set to be happy (for instance, the
> > allow-hot-smi-remove patch), or am I running into some other issue?
> >   
> I looked at this yesterday and today, I cannot figure out what would be 
> different between the two scenarios.  I could not reproduce this, and 
> it's probably best to just take all the patches as that is what I 
> tested.  I did test different loading orders in my testing.
> 
> I'll try to look at this again today.

I totally agree that it would be best to use all the patches rather than
to just pull out a few of them and I would always prefer to do that but
in the case of the particular scenario/issue I'm currently working on,
it won't be possible in the near-term. :-(  

I did try loading the ipmi-allow-hot-smi-remove patch along with the
other 4 patches I listed and it did seem to fix the driver-load-order
oops.  Is that a stand-alone patch or are there others in the set that
need to be loaded along with it?  I ran some tests with this new 5-patch
subset and didn't find any problems but my testing wasn't exhaustive so
I'm hoping to verify that grabbing only these 5 alone won't introduce
some other issue in some area I didn't touch in my testing.

> > I got a bit of info about the order in which the SMBIOS table is
> > populated and found out that it's currently populated in order of
> > increasing KCS I/O address but that this isn't necessarily an ordering
> > scheme that can be assumed for the future.  Also, regarding changing the
> > BIOS to make the deviceID unique across BMCs, I was told that if these
> > changes were made, we would likely be facing many issues such as
> > DeviceID mismatches with what's coded up in the SDR data, etc.  So I
> > suspect it's something that might not happen anytime soon (if ever).
> >   
> That really doesn't make any sense.  The only place I could find where 
> this Device ID is used is in the type 13 SDR: "Management Controller 
> Confirmation Record".  This record is used by utility software to record 
> that it found a specific management controller in the system.  It seems 
> of limited value to me, anyway, and having different device IDs would 
> seem to make this easier, not harder, to identify the different 
> management controllers.  From what I can tell, the use of this is for 
> system software to record the current management controller 
> configuration.  Then if system software finds something different, it 
> can say "Hey, something changed" and handle it.
> 
> Note that the term "Device ID" is heavily overloaded in the IPMI spec.  
> It also has "FRU Device ID" and "Device ID String", but those are 
> completely different things.
> 
> I see no other reliable mechanism to correlate management controllers 
> with nodes, especially if nodes ever become dynamic.  I really doubt you 
> will have any issues unless you have software that is hardcoded to 
> handle this.  That doesn't seem so, since they are all the same and it 
> doesn't provide any real useful information.  Perhaps the group doing 
> the work can suggest a reliable way to correlate the nodes and the 
> management controllers?
> 
Thanks very much for your input on this.  I'll take what you've said
back to the BIOS folks and re-open the discussion.  :-)

Thank you very much again for your ongoing and excellent help. :-)

Carol


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-11-17 Thread Corey Minyard
Carol Hebert wrote:
> Hi Corey,
>
> I wanted to let you know about some of the testing I've done with some
> of the new 39.1 patches and also to ask you about an issue I found.
>
> First, I wanted to ask you about the ipmi-remove-device-interface-limits
> patch.  It seems that when I have this patch loaded (along with just the
> 3 multinode fix patches listed below), the drivers work fine if ipmi_si
> is loaded last, but if ipmi_si loaded before ipmi_devintf, the system
> oopses (appended below).  Does this patch require one or more of the
> other patches in the 39.1 set to be happy (for instance, the
> allow-hot-smi-remove patch), or am I running into some other issue?
>   
I looked at this yesterday and today, I cannot figure out what would be 
different between the two scenarios.  I could not reproduce this, and 
it's probably best to just take all the patches as that is what I 
tested.  I did test different loading orders in my testing.

I'll try to look at this again today.
> Also, I wanted to let you know that I was able to get some time on an
> 8-way node and tested the following 39.1 patches:
> ipmi-fix-device-model-name.patch,
> ipmi-remove-interface-number-limits.patch
> ipmi-handle-sysfs-errors.patch
> ipmi-pass-sysfs-name-from-lower-level-driver.patch
>
> They seemed to work fine (with the drivers loaded in the "good" order
> described above).  All 8 device nodes were created and seemed to be
> equally usable.  
>   
Ok, thanks.
> I got a bit of info about the order in which the SMBIOS table is
> populated and found out that it's currently populated in order of
> increasing KCS I/O address but that this isn't necessarily an ordering
> scheme that can be assumed for the future.  Also, regarding changing the
> BIOS to make the deviceID unique across BMCs, I was told that if these
> changes were made, we would likely be facing many issues such as
> DeviceID mismatches with what's coded up in the SDR data, etc.  So I
> suspect it's something that might not happen anytime soon (if ever).
>   
That really doesn't make any sense.  The only place I could find where 
this Device ID is used is in the type 13 SDR: "Management Controller 
Confirmation Record".  This record is used by utility software to record 
that it found a specific management controller in the system.  It seems 
of limited value to me, anyway, and having different device IDs would 
seem to make this easier, not harder, to identify the different 
management controllers.  From what I can tell, the use of this is for 
system software to record the current management controller 
configuration.  Then if system software finds something different, it 
can say "Hey, something changed" and handle it.

Note that the term "Device ID" is heavily overloaded in the IPMI spec.  
It also has "FRU Device ID" and "Device ID String", but those are 
completely different things.

I see no other reliable mechanism to correlate management controllers 
with nodes, especially if nodes ever become dynamic.  I really doubt you 
will have any issues unless you have software that is hardcoded to 
handle this.  That doesn't seem so, since they are all the same and it 
doesn't provide any real useful information.  Perhaps the group doing 
the work can suggest a reliable way to correlate the nodes and the 
management controllers?

Thanks

-Corey


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-11-16 Thread Carol Hebert

Hi Corey,

I wanted to let you know about some of the testing I've done with some
of the new 39.1 patches and also to ask you about an issue I found.

First, I wanted to ask you about the ipmi-remove-device-interface-limits
patch.  It seems that when I have this patch loaded (along with just the
3 multinode fix patches listed below), the drivers work fine if ipmi_si
is loaded last, but if ipmi_si loaded before ipmi_devintf, the system
oopses (appended below).  Does this patch require one or more of the
other patches in the 39.1 set to be happy (for instance, the
allow-hot-smi-remove patch), or am I running into some other issue?

Also, I wanted to let you know that I was able to get some time on an
8-way node and tested the following 39.1 patches:
ipmi-fix-device-model-name.patch,
ipmi-remove-interface-number-limits.patch
ipmi-handle-sysfs-errors.patch
ipmi-pass-sysfs-name-from-lower-level-driver.patch

They seemed to work fine (with the drivers loaded in the "good" order
described above).  All 8 device nodes were created and seemed to be
equally usable.  

I got a bit of info about the order in which the SMBIOS table is
populated and found out that it's currently populated in order of
increasing KCS I/O address but that this isn't necessarily an ordering
scheme that can be assumed for the future.  Also, regarding changing the
BIOS to make the deviceID unique across BMCs, I was told that if these
changes were made, we would likely be facing many issues such as
DeviceID mismatches with what's coded up in the SDR data, etc.  So I
suspect it's something that might not happen anytime soon (if ever).

Anyway, hope this info is useful. 

Thanks for all your help,

Carol Hebert


Unable to handle kernel paging request at 000101ab RIP:
 [] kref_get+0xc/0x47  (<--this is where kref gets
dereferenced to get refcount; RDI/RBX hold kref) 
PGD 2894c067 PUD 0
Oops:  [1] SMP
last sysfs file: /class/ipmi/ipmi0/dev
CPU 0
Modules linked in: ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U)
autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U)
cpufreq_ondemand(U) video(U) sbs(U) i2c_ec(U) i2c_core(U) button(U)
battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) ipv6(U) parport_pc(U)
lp(U) parport(U) sg(U) intel_rng(U) shpchp(U) bnx2(U) tg3(U) pcspkr(U)
serio_raw(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U)
ata_piix(U) libata(U) aacraid(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U)
ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
Pid: 3955, comm: modprobe Not tainted 2.6.18-2714_ipmitest #2
RIP: 0010:[]  [] kref_get+0xc/0x47
RSP: :810028bfdb68  EFLAGS: 00010292
RAX: 81003b5ee9d0 RBX: 000101ab RCX: 81003b5ee000
RDX: 81003b5ee9d7 RSI: 802dc7c7 RDI: 000101ab
RBP: 810028bfdb78 R08: 8058a8d0 R09: 
R10: 81003b5ee9d0 R11: 0020 R12: fff4
R13: 802dc7c0 R14: 81002cb95700 R15: 81003b1c3f08
FS:  2aac5240() GS:8049b000()
knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 000101ab CR3: 28462000 CR4: 06e0
Process modprobe (pid: 3955, threadinfo 810028bfc000, task
8100029e00c0)
Stack:  810028bfdb98 0001018f 810028bfdb98
8005b2c8
 81002cb95700 81003b5eee68 810028bfdbd8 80110830
 0001018f  810035098270 8100336204c8
Call Trace:
 [] kobject_get+0x1a/0x21
 [] sysfs_create_link+0xbb/0x116
 [] class_device_add+0x267/0x46f
 [] class_device_register+0x19/0x1d
 [] class_device_create+0xf8/0x129
 [] :ipmi_devintf:ipmi_new_smi+0x72/0x98
 [] :ipmi_msghandler:ipmi_smi_watcher_register
+0xd8/0x12f
 [] :ipmi_devintf:init_ipmi_devintf+0xc7/0x100
 [] sys_init_module+0x1708/0x18cc
 [] tracesys+0xd1/0xdb
DWARF2 unwinder stuck at tracesys+0xd1/0xdb
Leftover inexact backtrace:


Code: 8b 07 85 c0 75 2e e8 fe a1 05 00 48 c7 c1 40 a5 29 80 49 89
RIP  [] kref_get+0xc/0x47
 RSP 
CR2: 000101ab

=
[ BUG: lock held at task exit time! ]
-
modprobe/3955 is exiting with locks still held!
2 locks held by modprobe/3955:
 #0:  (reg_list_mutex){--..}, at: [] mutex_lock
+0x2a/0x2e
 #1:  (&sysfs_inode_imutex_key){--..}, at: []
mutex_lock+0x2a/0x2e

stack backtrace:

Call Trace:
 [] show_trace+0xae/0x336
 [] dump_stack+0x15/0x17
 [] debug_check_no_locks_held+0x87/0x8b
 [] do_exit+0x8c2/0x911
 [] do_page_fault+0x7ba/0x842
 [] error_exit+0x0/0x96
DWARF2 unwinder stuck at error_exit+0x0/0x96
Leftover inexact backtrace:
 [] kref_get+0xc/0x47
 [] __kmalloc+0x125/0x134
 [] kobject_get+0x1a/0x21
 [] sysfs_create_link+0xbb/0x116
 [] class_device_add+0x267/0x46f
 [] class_device_register+0x19/0x1d
 [] class_device_create+0xf8/0x129
 [] __mutex_lock_slowpath+0x248/0x261
 [] mark_held_locks+0x53/0x79
 [] mutex_lock+0x2a/0x2e
 [] __mutex_lock_slowpath+0x248/0x261
 [] debug_mutex_free_waiter+0x5a/0x5e
 [] __mutex_lock_slowp

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-20 Thread Carol Hebert

On Fri, 2006-10-20 at 14:34 -0500, Corey Minyard wrote:

> >   
> Hmm, that might be harder on 2.4.  I have to review the set of patches
> to see what will go on for  the 2.4 release, so I'll look at it then. 
> It seems to me that 2.4 and the multi-node beasts wouldn't be a good
> match, but if it's needed...

Thanks much. :-)  I do know of folks who run 2.4 kernels and support up
to a 4-node system.   I guess that specific configuration would work
fine with the current table code.  Although I don't know of anyone
offhand who's running 2.4 on larger configs, we do support larger
systems and we support 2.4 distros so it's not inconceivable that
someone will eventually put the two together. :-} 

Thanks much again,

Carol Hebert


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-20 Thread Corey Minyard
Carol Hebert wrote:
> On Thu, 2006-10-19 at 21:46 -0500, Corey Minyard wrote:
>   
>> .
>>
>> I'm waiting for one more patch to be finished up and tested, and I'm
>> putting out a 2.6.18 patch set.
>>
>> 
>
> That's excellent news!  I'll run the patch set on my multi-nodes as soon
> as it's out.  BTW:  I was wondering if it would be much trouble to get
> the table->list patch put into the 2.4 tree as well?  I'd be happy to
> help and would be happy to test it out on a multi-node.  :-)
>   
Hmm, that might be harder on 2.4.  I have to review the set of patches
to see what will go on for  the 2.4 release, so I'll look at it then. 
It seems to me that 2.4 and the multi-node beasts wouldn't be a good
match, but if it's needed...

-Corey

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-20 Thread Carol Hebert
On Thu, 2006-10-19 at 21:46 -0500, Corey Minyard wrote:
> .
> 
> I'm waiting for one more patch to be finished up and tested, and I'm
> putting out a 2.6.18 patch set.
> 

That's excellent news!  I'll run the patch set on my multi-nodes as soon
as it's out.  BTW:  I was wondering if it would be much trouble to get
the table->list patch put into the 2.4 tree as well?  I'd be happy to
help and would be happy to test it out on a multi-node.  :-)

Thank you very much,  :-)

Carol Hebert


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-19 Thread Corey Minyard
Carol Hebert wrote:
> Hi,
>
> Wow!  I barely hit return on my email and the patch was in my
> inbox!! :-)
>   
Well, I had it sitting there, so it was easy.  Sorry about the compile
errors, those fixes had snuck into a later patch but didn't get put into
the right place.

I'm waiting for one more patch to be finished up and tested, and I'm
putting out a 2.6.18 patch set.

-Corey


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-19 Thread Carol Hebert
Hi,

Wow!  I barely hit return on my email and the patch was in my
inbox!! :-)

I made a couple of adjustments to the patch to make my compiler happy.
In the ipmi_smi_watcher_register() routine, I deleted the "&" on
to_deliver; also, I added GFP_KERNEL as a second arg to kmalloc:

int ipmi_smi_watcher_register(struct ipmi_smi_watcher *watcher)
{
ipmi_smi_t intf;
-   struct list_head to_deliver = LIST_HEAD_INIT(&to_deliver);
+   struct list_head to_deliver = LIST_HEAD_INIT(to_deliver);

struct watcher_entry *e, *e2;

mutex_lock(&ipmi_interfaces_mutex);

list_for_each_entry_rcu(intf, &ipmi_interfaces, link) {
if (intf->intf_num == -1)
continue;

-   e = kmalloc(sizeof(*e));
+   e = kmalloc(sizeof(*e), GFP_KERNEL);

if (!e)
goto out_err;
e->intf_num = intf->intf_num;
list_add_tail(&e->link, &to_deliver);
}



I ran it on my 2-node system and it seemed to work as well as the
previous table-oriented patched version (e.g. great! :-).  I'm still
working on getting an 8-node to test it on -- hopefully I'll get one
next week.

Thanks very much again,  :-)

Carol Hebert

On Thu, 2006-10-19 at 16:23 -0500, Corey Minyard wrote:
> Ok, patch is attached.
> 
> Carol Hebert wrote:
> > On Wed, 2006-10-18 at 13:37 -0700, Carol Hebert wrote:
> >   
> >> Hi Corey,
> >>
> >> This latest patch worked great on my 2-node system! :-D   I'll try to
> >> get some time on a 4-node and 8-node system asap to test it out on them
> >> as well. 
> >> 
> >
> > Oops, I guess I'll probably need that patch you were talking about
> > earlier to increase the number of supported nodes to > 4 to test the
> > 8-node system properly.  :-}  I think you mentioned changing the table
> > to a list to be able to support an arbitrary number of devices?  I was
> > wondering if you had any idea when you might be able to get a chance to
> > make that change?
> >
> > Thanks again for all your help,
> >
> > Carol Hebert
> >
> >
> > -
> > Using Tomcat but need to do more? Need to support web services, security?
> > Get stuff done quickly with pre-integrated technology to make your job 
> > easier
> > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > ___
> > Openipmi-developer mailing list
> > Openipmi-developer@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/openipmi-developer
> >   
> 


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-19 Thread Corey Minyard
Ok, patch is attached.

Carol Hebert wrote:
> On Wed, 2006-10-18 at 13:37 -0700, Carol Hebert wrote:
>   
>> Hi Corey,
>>
>> This latest patch worked great on my 2-node system! :-D   I'll try to
>> get some time on a 4-node and 8-node system asap to test it out on them
>> as well. 
>> 
>
> Oops, I guess I'll probably need that patch you were talking about
> earlier to increase the number of supported nodes to > 4 to test the
> 8-node system properly.  :-}  I think you mentioned changing the table
> to a list to be able to support an arbitrary number of devices?  I was
> wondering if you had any idea when you might be able to get a chance to
> make that change?
>
> Thanks again for all your help,
>
> Carol Hebert
>
>
> -
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> ___
> Openipmi-developer mailing list
> Openipmi-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
>   

This patch removes the arbitrary limit of number of IPMI interfaces.

Signed-off-by: Corey Minyard <[EMAIL PROTECTED]>

Index: linux-2.6.18/drivers/char/ipmi/ipmi_msghandler.c
===
--- linux-2.6.18.orig/drivers/char/ipmi/ipmi_msghandler.c
+++ linux-2.6.18/drivers/char/ipmi/ipmi_msghandler.c
@@ -193,6 +193,9 @@ struct ipmi_smi
 
 	struct kref refcount;
 
+	/* Used for a list of interfaces. */
+	struct list_head link;
+
 	/* The list of upper layers that are using me.  seq_lock
 	 * protects this. */
 	struct list_head users;
@@ -338,13 +341,6 @@ struct ipmi_smi
 };
 #define to_si_intf_from_dev(device) container_of(device, struct ipmi_smi, dev)
 
-/* Used to mark an interface entry that cannot be used but is not a
- * free entry, either, primarily used at creation and deletion time so
- * a slot doesn't get reused too quickly. */
-#define IPMI_INVALID_INTERFACE_ENTRY ((ipmi_smi_t) ((long) 1))
-#define IPMI_INVALID_INTERFACE(i) (((i) == NULL) \
-   || (i == IPMI_INVALID_INTERFACE_ENTRY))
-
 /**
  * The driver model view of the IPMI messaging driver.
  */
@@ -354,11 +350,8 @@ static struct device_driver ipmidriver =
 };
 static DEFINE_MUTEX(ipmidriver_mutex);
 
-#define MAX_IPMI_INTERFACES 4
-static ipmi_smi_t ipmi_interfaces[MAX_IPMI_INTERFACES];
-
-/* Directly protects the ipmi_interfaces data structure. */
-static DEFINE_SPINLOCK(interfaces_lock);
+static struct list_head ipmi_interfaces = LIST_HEAD_INIT(ipmi_interfaces);
+static DEFINE_MUTEX(ipmi_interfaces_mutex);
 
 /* List of watchers that want to know when smi's are added and
deleted. */
@@ -413,25 +406,50 @@ static void intf_free(struct kref *ref)
 	kfree(intf);
 }
 
+struct watcher_entry {
+	struct list_head link;
+	int intf_num;
+};
+
 int ipmi_smi_watcher_register(struct ipmi_smi_watcher *watcher)
 {
-	int   i;
-	unsigned long flags;
+	ipmi_smi_t intf;
+	struct list_head to_deliver = LIST_HEAD_INIT(&to_deliver);
+	struct watcher_entry *e, *e2;
+
+	mutex_lock(&ipmi_interfaces_mutex);
+
+	list_for_each_entry_rcu(intf, &ipmi_interfaces, link) {
+		if (intf->intf_num == -1)
+			continue;
+		e = kmalloc(sizeof(*e));
+		if (!e)
+			goto out_err;
+		e->intf_num = intf->intf_num;
+		list_add_tail(&e->link, &to_deliver);
+	}
 
 	down_write(&smi_watchers_sem);
 	list_add(&(watcher->link), &smi_watchers);
 	up_write(&smi_watchers_sem);
-	spin_lock_irqsave(&interfaces_lock, flags);
-	for (i = 0; i < MAX_IPMI_INTERFACES; i++) {
-		ipmi_smi_t intf = ipmi_interfaces[i];
-		if (IPMI_INVALID_INTERFACE(intf))
-			continue;
-		spin_unlock_irqrestore(&interfaces_lock, flags);
-		watcher->new_smi(i, intf->si_dev);
-		spin_lock_irqsave(&interfaces_lock, flags);
+
+	mutex_unlock(&ipmi_interfaces_mutex);
+
+	list_for_each_entry_safe(e, e2, &to_deliver, link) {
+		list_del(&e->link);
+		watcher->new_smi(e->intf_num, intf->si_dev);
+		kfree(e);
 	}
-	spin_unlock_irqrestore(&interfaces_lock, flags);
+
+
 	return 0;
+
+ out_err:
+	list_for_each_entry_safe(e, e2, &to_deliver, link) {
+		list_del(&e->link);
+		kfree(e);
+	}
+	return -ENOMEM;
 }
 
 int ipmi_smi_watcher_unregister(struct ipmi_smi_watcher *watcher)
@@ -766,17 +784,19 @@ int ipmi_create_user(unsigned int   
 	if (!new_user)
 		return -ENOMEM;
 
-	spin_lock_irqsave(&interfaces_lock, flags);
-	intf = ipmi_interfaces[if_num];
-	if ((if_num >= MAX_IPMI_INTERFACES) || IPMI_INVALID_INTERFACE(intf)) {
-		spin_unlock_irqrestore(&interfaces_lock, flags);
-		rv = -EINVAL;
-		goto out_kfree;
+	rcu_read_lock();
+	list_for_each_entry_rcu(intf, &ipmi_interfaces, link) {
+		if (intf->intf_num == if_num)
+			goto found;
 	}
+	rcu_read_unlock();
+	rv = -EINVAL;
+

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-19 Thread Carol Hebert
On Wed, 2006-10-18 at 13:37 -0700, Carol Hebert wrote:
> Hi Corey,
> 
> This latest patch worked great on my 2-node system! :-D   I'll try to
> get some time on a 4-node and 8-node system asap to test it out on them
> as well. 

Oops, I guess I'll probably need that patch you were talking about
earlier to increase the number of supported nodes to > 4 to test the
8-node system properly.  :-}  I think you mentioned changing the table
to a list to be able to support an arbitrary number of devices?  I was
wondering if you had any idea when you might be able to get a chance to
make that change?

Thanks again for all your help,

Carol Hebert


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-18 Thread Carol Hebert
On Wed, 2006-10-18 at 16:17 -0500, Corey Minyard wrote:
> Carol Hebert wrote:
> > Hi Corey,
> >
> > This latest patch worked great on my 2-node system! :-D   I'll try to
> > get some time on a 4-node and 8-node system asap to test it out on them
> > as well. 
> >
> > I've listed below how ipmi and the BMCs are now represented in sysfs.
> > Do you still want me to continue working on trying to get some unique
> > BMC device ID/GUID change made in the f/w as well (and in the process
> > find out what we have now ;-}?  I'm also working on finding out whether
> > or not it's guaranteed that the BMCs are listed in node order in the
> > SMBIOS table.
> >   
> It's probably best to get the unique device id in the firmware.  That is
> the only sure way to know that a specific IPMI device maps to a specific
> node's BMC, and IMHO it's the right way to do things.


Will do.  I'll let you know what I find out and keep you apprised of the
progress.

Thanks again,

Carol Hebert


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-18 Thread Corey Minyard
Carol Hebert wrote:
> Hi Corey,
>
> This latest patch worked great on my 2-node system! :-D   I'll try to
> get some time on a 4-node and 8-node system asap to test it out on them
> as well. 
>
> I've listed below how ipmi and the BMCs are now represented in sysfs.
> Do you still want me to continue working on trying to get some unique
> BMC device ID/GUID change made in the f/w as well (and in the process
> find out what we have now ;-}?  I'm also working on finding out whether
> or not it's guaranteed that the BMCs are listed in node order in the
> SMBIOS table.
>   
It's probably best to get the unique device id in the firmware.  That is
the only sure way to know that a specific IPMI device maps to a specific
node's BMC, and IMHO it's the right way to do things.
> Thanks very much for your help and for making my day! :-D
>   
You are welcome.

-Corey
> Carol Hebert
>
> 
>
> /sys/class/ipmi/ipmi1/device
> /sys/class/ipmi/ipmi1/dev
> /sys/class/ipmi/ipmi1/uevent
> /sys/class/ipmi/ipmi1/subsystem
> /sys/class/ipmi/ipmi0
> /sys/class/ipmi/ipmi0/device
> /sys/class/ipmi/ipmi0/dev
> /sys/class/ipmi/ipmi0/uevent
> /sys/class/ipmi/ipmi0/subsystem
> /sys/bus/pci/drivers/ipmi_si
> /sys/bus/pci/drivers/ipmi_si/new_id
> /sys/bus/pci/drivers/ipmi_si/bind
> /sys/bus/pci/drivers/ipmi_si/unbind
> /sys/bus/pci/drivers/ipmi_si/module
> /sys/bus/platform/drivers/ipmi_si
> /sys/bus/platform/drivers/ipmi_si/ipmi_si.1
> /sys/bus/platform/drivers/ipmi_si/ipmi_si.0
> /sys/bus/platform/drivers/ipmi_si/bind
> /sys/bus/platform/drivers/ipmi_si/unbind
> /sys/bus/platform/drivers/ipmi
> /sys/bus/platform/drivers/ipmi/ipmi_bmc.0007.33
> /sys/bus/platform/drivers/ipmi/ipmi_bmc.0007.32
> /sys/bus/platform/drivers/ipmi/bind
> /sys/bus/platform/drivers/ipmi/unbind
> /sys/bus/platform/devices/ipmi_bmc.0007.33
> /sys/bus/platform/devices/ipmi_si.1
> /sys/bus/platform/devices/ipmi_bmc.0007.32
> /sys/bus/platform/devices/ipmi_si.0
> /sys/devices/platform/ipmi_bmc.0007.33
> /sys/devices/platform/ipmi_bmc.0007.33/ipmi1
> /sys/devices/platform/ipmi_bmc.0007.33/guid
> /sys/devices/platform/ipmi_bmc.0007.33/aux_firmware_revision
> /sys/devices/platform/ipmi_bmc.0007.33/product_id
> /sys/devices/platform/ipmi_bmc.0007.33/manufacturer_id
> /sys/devices/platform/ipmi_bmc.0007.33/additional_device_support
> /sys/devices/platform/ipmi_bmc.0007.33/ipmi_version
> /sys/devices/platform/ipmi_bmc.0007.33/firmware_revision
> /sys/devices/platform/ipmi_bmc.0007.33/revision
> /sys/devices/platform/ipmi_bmc.0007.33/provides_device_sdrs
> /sys/devices/platform/ipmi_bmc.0007.33/device_id
> /sys/devices/platform/ipmi_bmc.0007.33/driver
> /sys/devices/platform/ipmi_bmc.0007.33/bus
> /sys/devices/platform/ipmi_bmc.0007.33/subsystem
> /sys/devices/platform/ipmi_bmc.0007.33/modalias
> /sys/devices/platform/ipmi_bmc.0007.33/power
> /sys/devices/platform/ipmi_bmc.0007.33/power/wakeup
> /sys/devices/platform/ipmi_bmc.0007.33/power/state
> /sys/devices/platform/ipmi_bmc.0007.33/uevent
> /sys/devices/platform/ipmi_si.1
> /sys/devices/platform/ipmi_si.1/ipmi:ipmi1
> /sys/devices/platform/ipmi_si.1/bmc
> /sys/devices/platform/ipmi_si.1/driver
> /sys/devices/platform/ipmi_si.1/bus
> /sys/devices/platform/ipmi_si.1/subsystem
> /sys/devices/platform/ipmi_si.1/modalias
> /sys/devices/platform/ipmi_si.1/power
> /sys/devices/platform/ipmi_si.1/power/wakeup
> /sys/devices/platform/ipmi_si.1/power/state
> /sys/devices/platform/ipmi_si.1/uevent
> /sys/devices/platform/ipmi_bmc.0007.32
> /sys/devices/platform/ipmi_bmc.0007.32/ipmi0
> /sys/devices/platform/ipmi_bmc.0007.32/guid
> /sys/devices/platform/ipmi_bmc.0007.32/aux_firmware_revision
> /sys/devices/platform/ipmi_bmc.0007.32/product_id
> /sys/devices/platform/ipmi_bmc.0007.32/manufacturer_id
> /sys/devices/platform/ipmi_bmc.0007.32/additional_device_support
> /sys/devices/platform/ipmi_bmc.0007.32/ipmi_version
> /sys/devices/platform/ipmi_bmc.0007.32/firmware_revision
> /sys/devices/platform/ipmi_bmc.0007.32/revision
> /sys/devices/platform/ipmi_bmc.0007.32/provides_device_sdrs
> /sys/devices/platform/ipmi_bmc.0007.32/device_id
> /sys/devices/platform/ipmi_bmc.0007.32/driver
> /sys/devices/platform/ipmi_bmc.0007.32/bus
> /sys/devices/platform/ipmi_bmc.0007.32/subsystem
> /sys/devices/platform/ipmi_bmc.0007.32/modalias
> /sys/devices/platform/ipmi_bmc.0007.32/power
> /sys/devices/platform/ipmi_bmc.0007.32/power/wakeup
> /sys/devices/platform/ipmi_bmc.0007.32/power/state
> /sys/devices/platform/ipmi_bmc.0007.32/uevent
> /sys/devices/platform/ipmi_si.0
> /sys/devices/platform/ipmi_si.0/ipmi:ipmi0
> /sys/devices/platform/ipmi_si.0/bmc
> /sys/devices/platform/ipmi_si.0/driver
> /sys/devices/platform/ipmi_si.0/bus
> /sys/devices/platform/ipmi_si.0/subsystem
> /sys/devices/platform/ipmi_si.0/modalias
> /sys/devices/platform/ipmi_si.0/power
> /sys/devices/platform/ipmi_si.0/power/wakeup
> /sys/devices/platform/ipmi_si.0/power/state
> /sys/devices/platform/ipmi_si.0/uevent
>
>
> # ls -l /dev/ipmi*
> crw--- 1 root root 252

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-18 Thread Carol Hebert

Hi Corey,

This latest patch worked great on my 2-node system! :-D   I'll try to
get some time on a 4-node and 8-node system asap to test it out on them
as well. 

I've listed below how ipmi and the BMCs are now represented in sysfs.
Do you still want me to continue working on trying to get some unique
BMC device ID/GUID change made in the f/w as well (and in the process
find out what we have now ;-}?  I'm also working on finding out whether
or not it's guaranteed that the BMCs are listed in node order in the
SMBIOS table.

Thanks very much for your help and for making my day! :-D

Carol Hebert



/sys/class/ipmi/ipmi1/device
/sys/class/ipmi/ipmi1/dev
/sys/class/ipmi/ipmi1/uevent
/sys/class/ipmi/ipmi1/subsystem
/sys/class/ipmi/ipmi0
/sys/class/ipmi/ipmi0/device
/sys/class/ipmi/ipmi0/dev
/sys/class/ipmi/ipmi0/uevent
/sys/class/ipmi/ipmi0/subsystem
/sys/bus/pci/drivers/ipmi_si
/sys/bus/pci/drivers/ipmi_si/new_id
/sys/bus/pci/drivers/ipmi_si/bind
/sys/bus/pci/drivers/ipmi_si/unbind
/sys/bus/pci/drivers/ipmi_si/module
/sys/bus/platform/drivers/ipmi_si
/sys/bus/platform/drivers/ipmi_si/ipmi_si.1
/sys/bus/platform/drivers/ipmi_si/ipmi_si.0
/sys/bus/platform/drivers/ipmi_si/bind
/sys/bus/platform/drivers/ipmi_si/unbind
/sys/bus/platform/drivers/ipmi
/sys/bus/platform/drivers/ipmi/ipmi_bmc.0007.33
/sys/bus/platform/drivers/ipmi/ipmi_bmc.0007.32
/sys/bus/platform/drivers/ipmi/bind
/sys/bus/platform/drivers/ipmi/unbind
/sys/bus/platform/devices/ipmi_bmc.0007.33
/sys/bus/platform/devices/ipmi_si.1
/sys/bus/platform/devices/ipmi_bmc.0007.32
/sys/bus/platform/devices/ipmi_si.0
/sys/devices/platform/ipmi_bmc.0007.33
/sys/devices/platform/ipmi_bmc.0007.33/ipmi1
/sys/devices/platform/ipmi_bmc.0007.33/guid
/sys/devices/platform/ipmi_bmc.0007.33/aux_firmware_revision
/sys/devices/platform/ipmi_bmc.0007.33/product_id
/sys/devices/platform/ipmi_bmc.0007.33/manufacturer_id
/sys/devices/platform/ipmi_bmc.0007.33/additional_device_support
/sys/devices/platform/ipmi_bmc.0007.33/ipmi_version
/sys/devices/platform/ipmi_bmc.0007.33/firmware_revision
/sys/devices/platform/ipmi_bmc.0007.33/revision
/sys/devices/platform/ipmi_bmc.0007.33/provides_device_sdrs
/sys/devices/platform/ipmi_bmc.0007.33/device_id
/sys/devices/platform/ipmi_bmc.0007.33/driver
/sys/devices/platform/ipmi_bmc.0007.33/bus
/sys/devices/platform/ipmi_bmc.0007.33/subsystem
/sys/devices/platform/ipmi_bmc.0007.33/modalias
/sys/devices/platform/ipmi_bmc.0007.33/power
/sys/devices/platform/ipmi_bmc.0007.33/power/wakeup
/sys/devices/platform/ipmi_bmc.0007.33/power/state
/sys/devices/platform/ipmi_bmc.0007.33/uevent
/sys/devices/platform/ipmi_si.1
/sys/devices/platform/ipmi_si.1/ipmi:ipmi1
/sys/devices/platform/ipmi_si.1/bmc
/sys/devices/platform/ipmi_si.1/driver
/sys/devices/platform/ipmi_si.1/bus
/sys/devices/platform/ipmi_si.1/subsystem
/sys/devices/platform/ipmi_si.1/modalias
/sys/devices/platform/ipmi_si.1/power
/sys/devices/platform/ipmi_si.1/power/wakeup
/sys/devices/platform/ipmi_si.1/power/state
/sys/devices/platform/ipmi_si.1/uevent
/sys/devices/platform/ipmi_bmc.0007.32
/sys/devices/platform/ipmi_bmc.0007.32/ipmi0
/sys/devices/platform/ipmi_bmc.0007.32/guid
/sys/devices/platform/ipmi_bmc.0007.32/aux_firmware_revision
/sys/devices/platform/ipmi_bmc.0007.32/product_id
/sys/devices/platform/ipmi_bmc.0007.32/manufacturer_id
/sys/devices/platform/ipmi_bmc.0007.32/additional_device_support
/sys/devices/platform/ipmi_bmc.0007.32/ipmi_version
/sys/devices/platform/ipmi_bmc.0007.32/firmware_revision
/sys/devices/platform/ipmi_bmc.0007.32/revision
/sys/devices/platform/ipmi_bmc.0007.32/provides_device_sdrs
/sys/devices/platform/ipmi_bmc.0007.32/device_id
/sys/devices/platform/ipmi_bmc.0007.32/driver
/sys/devices/platform/ipmi_bmc.0007.32/bus
/sys/devices/platform/ipmi_bmc.0007.32/subsystem
/sys/devices/platform/ipmi_bmc.0007.32/modalias
/sys/devices/platform/ipmi_bmc.0007.32/power
/sys/devices/platform/ipmi_bmc.0007.32/power/wakeup
/sys/devices/platform/ipmi_bmc.0007.32/power/state
/sys/devices/platform/ipmi_bmc.0007.32/uevent
/sys/devices/platform/ipmi_si.0
/sys/devices/platform/ipmi_si.0/ipmi:ipmi0
/sys/devices/platform/ipmi_si.0/bmc
/sys/devices/platform/ipmi_si.0/driver
/sys/devices/platform/ipmi_si.0/bus
/sys/devices/platform/ipmi_si.0/subsystem
/sys/devices/platform/ipmi_si.0/modalias
/sys/devices/platform/ipmi_si.0/power
/sys/devices/platform/ipmi_si.0/power/wakeup
/sys/devices/platform/ipmi_si.0/power/state
/sys/devices/platform/ipmi_si.0/uevent


# ls -l /dev/ipmi*
crw--- 1 root root 252, 0 Oct 18 11:52  /dev/ipmi0
crw--- 1 root root 252, 1 Oct 18 11:52  /dev/ipmi1


On Tue, 2006-10-17 at 17:22 -0500, Corey Minyard wrote:
> Corey Minyard wrote:
> >
> >> Please let me know what I can do to help.  In the meantime, I'll take a
> >> look at the current code and try to figure out why it's still oopsing. 
> >> 
> > I thought the oops was fixed.  If not, can you send one?
> >
> > As far as things you can do, I'm not really sure.  I don't have en

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-17 Thread Corey Minyard
Corey Minyard wrote:
>
>> Please let me know what I can do to help.  In the meantime, I'll take a
>> look at the current code and try to figure out why it's still oopsing. 
>> 
> I thought the oops was fixed.  If not, can you send one?
>
> As far as things you can do, I'm not really sure.  I don't have enough
> details on how this hardware works to design a solution.  This is really
> nitty-gritty detail information, like how the nodes map their BMC
> addresses and how the SMBIOS table is populated.  If the BMCs appeared
> in the SMBIOS tables in node order, then the solution is very easy, just
> detect and add 1 for each.  I could just print a warning at startup when
> it detects this and it would probably cover a multitude of future evils :-).
>
>   
I thought about this some more, and it's a good idea, I believe to do
this.  The patch was easy, and I have tested it using a simulator.

Note that I found a bug in the product id stuff.  It may be that your
BMCs don't have a *device* GUID or at least a unique device GUID.  (Note
that a device GUID is different than a system GUID, and your system may
only have a system GUID.  The system GUID is supposed to be the same for
the entire system, but each BMC is supposed to have its own unique
device GUID if it supports that).  I was passing a 16-bit value as an
unsigned char in the compare routine, so it never matched based on
product/device id.  So with this patch, either you will get the previous
behavior (if your system supports device GUIDs) or all the BMCs will
appear to be the a single BMC with multiple interfaces to it (if device
GUIDs are not supported).

This patch replaces the previous one I sent you.

-Corey

This patch adds the product id to the driver model platform device
name, in addition to the device id.  The IPMI speci does not require
that individual BMCs in a system have unique devices IDs, but it
does require that the product id/device id combination be unique.

This also remove a redundant check and cleans up error handling
when the sysfs registration fails.  It also passes in the sysfs
name from the lower-level driver, as the coming IPMI serial driver
will need that to link properly from the serial device sysfs
directory.

Index: linux-2.6.18/drivers/char/ipmi/ipmi_msghandler.c
===
--- linux-2.6.18.orig/drivers/char/ipmi/ipmi_msghandler.c
+++ linux-2.6.18/drivers/char/ipmi/ipmi_msghandler.c
@@ -202,6 +202,7 @@ struct ipmi_smi
 
 	struct bmc_device *bmc;
 	char *my_dev_name;
+	char *sysfs_name;
 
 	/* This is the lower-layer's sender routine. */
 	struct ipmi_smi_handlers *handlers;
@@ -1807,13 +1808,12 @@ static int __find_bmc_prod_dev_id(struct
 	struct bmc_device *bmc = dev_get_drvdata(dev);
 
 	return (bmc->id.product_id == id->product_id
-		&& bmc->id.product_id == id->product_id
 		&& bmc->id.device_id == id->device_id);
 }
 
 static struct bmc_device *ipmi_find_bmc_prod_dev_id(
 	struct device_driver *drv,
-	unsigned char product_id, unsigned char device_id)
+	unsigned int product_id, unsigned char device_id)
 {
 	struct prod_dev_id id = {
 		.product_id = product_id,
@@ -1930,6 +1930,9 @@ static ssize_t guid_show(struct device *
 
 static void remove_files(struct bmc_device *bmc)
 {
+	if (!bmc->dev)
+		return;
+
 	device_remove_file(&bmc->dev->dev,
 			   &bmc->device_id_attr);
 	device_remove_file(&bmc->dev->dev,
@@ -1963,7 +1966,8 @@ cleanup_bmc_device(struct kref *ref)
 	bmc = container_of(ref, struct bmc_device, refcount);
 
 	remove_files(bmc);
-	platform_device_unregister(bmc->dev);
+	if (bmc->dev)
+		platform_device_unregister(bmc->dev);
 	kfree(bmc);
 }
 
@@ -1971,7 +1975,11 @@ static void ipmi_bmc_unregister(ipmi_smi
 {
 	struct bmc_device *bmc = intf->bmc;
 
-	sysfs_remove_link(&intf->si_dev->kobj, "bmc");
+	if (intf->sysfs_name) {
+		sysfs_remove_link(&intf->si_dev->kobj, intf->sysfs_name);
+		kfree(intf->sysfs_name);
+		intf->sysfs_name = NULL;
+	}
 	if (intf->my_dev_name) {
 		sysfs_remove_link(&bmc->dev->dev.kobj, intf->my_dev_name);
 		kfree(intf->my_dev_name);
@@ -1980,6 +1988,7 @@ static void ipmi_bmc_unregister(ipmi_smi
 
 	mutex_lock(&ipmidriver_mutex);
 	kref_put(&bmc->refcount, cleanup_bmc_device);
+	intf->bmc = NULL;
 	mutex_unlock(&ipmidriver_mutex);
 }
 
@@ -1987,6 +1996,56 @@ static int create_files(struct bmc_devic
 {
 	int err;
 
+	bmc->device_id_attr.attr.name = "device_id";
+	bmc->device_id_attr.attr.owner = THIS_MODULE;
+	bmc->device_id_attr.attr.mode = S_IRUGO;
+	bmc->device_id_attr.show = device_id_show;
+
+	bmc->provides_dev_sdrs_attr.attr.name = "provides_device_sdrs";
+	bmc->provides_dev_sdrs_attr.attr.owner = THIS_MODULE;
+	bmc->provides_dev_sdrs_attr.attr.mode = S_IRUGO;
+	bmc->provides_dev_sdrs_attr.show = provides_dev_sdrs_show;
+
+	bmc->revision_attr.attr.name = "revision";
+	bmc->revision_attr.attr.owner = THIS_MODULE;
+	bmc->revision_attr.attr.mode = S_IRUGO;
+	bmc->revision_attr.show = revision_show;
+
+	bmc->firmware_rev_at

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-17 Thread Corey Minyard
Carol Hebert wrote:
> Hi Corey,
>
> Sorry I wasn't able to reply sooner.  I wanted to discuss this with a
> couple of other folks first.
>
> As per your first solution listed below, I'm going to propose asap that
> we modify the f/w to ensure that the device ID is unique for every BMC.
> I don't know yet if the proposal will be accepted, but assuming it is,
> it would solve the problem for the long term.  However, maybe we ought
> to supplement that solution with one of the other solutions you listed
> (or some combination thereof) to solve the problem in the interim until
> the new f/w is released and also in general for users running the older
> (current) f/w?
>   
Ok, that's probably good for short term.  Do you know if devices with
different firmware might mix?  That might end up being messy.
> Creating an internal mapping table of the different BMCs found at probe
> time (and maybe setting the device ID to be an index into the table)
> might be useful.  Using the GUID (as you imply) would be messy, however,
> not saving the GUID/address in some form for later use might make it
> difficult to know for sure where each BMC is physically located unless
> it's guaranteed that the BMCs are probed in order.  Do you know if
> there's a guarantee that the BMCs are probed sequentially in a multi-BMC
> system? 
>   
BMCs are probed in the order that the appear in the SMBIOS/ACPI tables. 
Note that I have a new patch to allow hot adding/removing BMCs.  You
might be able to move the problem to userspace and handle it there.
> Please let me know what I can do to help.  In the meantime, I'll take a
> look at the current code and try to figure out why it's still oopsing.
>   
I thought the oops was fixed.  If not, can you send one?

As far as things you can do, I'm not really sure.  I don't have enough
details on how this hardware works to design a solution.  This is really
nitty-gritty detail information, like how the nodes map their BMC
addresses and how the SMBIOS table is populated.  If the BMCs appeared
in the SMBIOS tables in node order, then the solution is very easy, just
detect and add 1 for each.  I could just print a warning at startup when
it detects this and it would probably cover a multitude of future evils :-).

-Corey
> Thanks for your help,
>
> Carol Hebert
>
>   
>>>   
>>>   
>> The "easily digestible form" part is the problem here.  You need some
>> method to correlate a GUID to something a human being can use to
>> identify a system, and it would be nice if it wasn't custom for every
>> installed system out there.  The address is perhaps better, but I'm
>> going to have to have some way to translate the addresses to system
>> numbers, and it's going to have to be OEM for this type of hardware, and
>> the addresses are not available at the level this is happening, this
>> code is generic for all interface types.
>>
>> So what we can do, in my order of preference :-) :
>>
>>1. Modify the IPMI firmware to set the device id to a unique number
>>   for every BMC in the system.  It would be really nice if this was
>>   done in a way that the device ids could be correlated with
>>   physical systems.  This will work with the IPMI driver as-is, and
>>   I checked and udev translations can be done as-is, too, I believe.
>>2. Use some OEM IPMI command that could query the physical system
>>   number, if something like this exists.
>>3. Create an OEM handler to use the GUID to map to physical systems. 
>>   I'm going to need some help with this, I have no idea how to do
>>   this.  Looking at the GUID format (Table 20-10 in the IPMI 2.0
>>   spec), I don't see any way to do this.  The node field, BTW, is
>>   supposed to be the 802.x MAC address.
>>4. Use the I/O address.  This introduces a lot of headaches into the
>>   structure of the IPMI driver as the address has to be propagated
>>   from the interface-specific handler to the generic code, and it
>>   introduces an OEM handler.  And I'll need some way to map the I/O
>>   addresses to physical systems.
>>
>> Any more ideas?
>>
>> -Corey
>> 
>>> Regarding the ipmi device support currently being fixed at a max of 4,
>>> the largest multi-node configuration we currently have is 8 so we would
>>> need to have the table size bumped up to at least 8.  However, for
>>> future support, it might be useful to increase it even more (12, 16?).
>>>   
>>>   
>> I'll probably just make it a list and get rid of the table so it can be
>> arbitrary counts.
>> 
>>> Finally, I don't believe dynamic node plugging will generally be an
>>> issue for my system since the nodes are merged at boot time rather than
>>> being dynamically added and/or removed.
>>>   
>>>   
>> So the time is not here yet, but I'm sure it's coming someday :)  I can
>> wait on this one, then, but I decided it would be pretty easy to do
>> through the hotplug subsystem.
>>
>> -Corey
>> 
>>> Thanks very much,

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-17 Thread Carol Hebert

Hi Corey,

Sorry I wasn't able to reply sooner.  I wanted to discuss this with a
couple of other folks first.

As per your first solution listed below, I'm going to propose asap that
we modify the f/w to ensure that the device ID is unique for every BMC.
I don't know yet if the proposal will be accepted, but assuming it is,
it would solve the problem for the long term.  However, maybe we ought
to supplement that solution with one of the other solutions you listed
(or some combination thereof) to solve the problem in the interim until
the new f/w is released and also in general for users running the older
(current) f/w?

Creating an internal mapping table of the different BMCs found at probe
time (and maybe setting the device ID to be an index into the table)
might be useful.  Using the GUID (as you imply) would be messy, however,
not saving the GUID/address in some form for later use might make it
difficult to know for sure where each BMC is physically located unless
it's guaranteed that the BMCs are probed in order.  Do you know if
there's a guarantee that the BMCs are probed sequentially in a multi-BMC
system? 

Please let me know what I can do to help.  In the meantime, I'll take a
look at the current code and try to figure out why it's still oopsing.

Thanks for your help,

Carol Hebert

> >   
> The "easily digestible form" part is the problem here.  You need some
> method to correlate a GUID to something a human being can use to
> identify a system, and it would be nice if it wasn't custom for every
> installed system out there.  The address is perhaps better, but I'm
> going to have to have some way to translate the addresses to system
> numbers, and it's going to have to be OEM for this type of hardware, and
> the addresses are not available at the level this is happening, this
> code is generic for all interface types.
> 
> So what we can do, in my order of preference :-) :
> 
>1. Modify the IPMI firmware to set the device id to a unique number
>   for every BMC in the system.  It would be really nice if this was
>   done in a way that the device ids could be correlated with
>   physical systems.  This will work with the IPMI driver as-is, and
>   I checked and udev translations can be done as-is, too, I believe.
>2. Use some OEM IPMI command that could query the physical system
>   number, if something like this exists.
>3. Create an OEM handler to use the GUID to map to physical systems. 
>   I'm going to need some help with this, I have no idea how to do
>   this.  Looking at the GUID format (Table 20-10 in the IPMI 2.0
>   spec), I don't see any way to do this.  The node field, BTW, is
>   supposed to be the 802.x MAC address.
>4. Use the I/O address.  This introduces a lot of headaches into the
>   structure of the IPMI driver as the address has to be propagated
>   from the interface-specific handler to the generic code, and it
>   introduces an OEM handler.  And I'll need some way to map the I/O
>   addresses to physical systems.
> 
> Any more ideas?
> 
> -Corey
> > Regarding the ipmi device support currently being fixed at a max of 4,
> > the largest multi-node configuration we currently have is 8 so we would
> > need to have the table size bumped up to at least 8.  However, for
> > future support, it might be useful to increase it even more (12, 16?).
> >   
> I'll probably just make it a list and get rid of the table so it can be
> arbitrary counts.
> > Finally, I don't believe dynamic node plugging will generally be an
> > issue for my system since the nodes are merged at boot time rather than
> > being dynamically added and/or removed.
> >   
> So the time is not here yet, but I'm sure it's coming someday :)  I can
> wait on this one, then, but I decided it would be pretty easy to do
> through the hotplug subsystem.
> 
> -Corey
> > Thanks very much,
> >
> > Carol Hebert
> >
> > On Wed, 2006-10-11 at 10:25 -0500, Corey Minyard wrote:
> >   
> >> Now the driver is doing exactly what it is supposed to do, but now that
> >> may not be what we want.  I'm not sure of the configuration of this
> >> system, but the information below gives me some clues.  Here's my guess
> >> on the system:
> >>
> >> This is a NUMA system with hot-plug CPU boards.  Each board has an IPMI
> >> controller on it.  The BIOS maps the I/O address and SMBIOS tables for
> >> the IPMI controller to different I/O locations based upon the slot the
> >> board is in.  There are a number of problems beyond this one for a
> >> configuration of this nature.  I'll address those later.
> >>
> >> In response to your question, I believe this is exactly what the Device
> >> ID in IPMI is intended for.  Each board in the system should have a
> >> unique device id based upon the slot it is in.  Say you have an
> >> application that monitors the CPU temperature of all the CPUs.  If a
> >> temperature goes out of range, you want to know which board that CPU is
> >> on

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-11 Thread Corey Minyard
Carol Hebert wrote:
> Hi,
>
> I believe your assessment of my x460 dual-node system configuration is
> correct with the exception of maybe changing the word "slot" to "system"
> since the nodes are joined by scalability cables rather than being
> connected via a common backplane.
>
> Regarding the uniqueness of the Device ID, I think you mentioned in an
> earlier email that the spec was a bit contradictory on the topic.  I
> took a look at the spec and agree that it is not at all clear whether
> the Device ID should be unique for all controllers or only for ones that
> support a different set of application commands/OEM fields.  In one
> paragraph, it states that:
>
> "Controllers that implement identical sets of applications (sic)
> commands can have the same Device ID in a given system. Thus a
> 'standardized' controller could be produced where multiple instances of
> the controller are used in a system, and all have the same Device ID
> value.  The controllers would still be differentiable by their
> address..."  
>
> and in the *immediately following* paragraph, it states 
>
> "A controller can optionally use the Device ID as an 'instance'
> identifier if more than one controller of that kind is used in the
> system."   (It then goes on to say that the GUID, however, is the
> preferred method of uniquely identifying controllers.)
>
> Sheesh.  :-}  
>
> In checking out the dmidecode data, I verified that the addresses of the
> controllers on the multi-node system are unique and available there.  So
> both the GUID and the address are unique for the controllers on the
> multi-node system whereas the Device ID is not.  Can we use the GUID
> (maybe in some more easily digestible form) or the address instead of
> the Device ID?   It seems like the only thing that's clear from the spec
> is that the Device ID's uniqueness is something we can't count on.
>   
The "easily digestible form" part is the problem here.  You need some
method to correlate a GUID to something a human being can use to
identify a system, and it would be nice if it wasn't custom for every
installed system out there.  The address is perhaps better, but I'm
going to have to have some way to translate the addresses to system
numbers, and it's going to have to be OEM for this type of hardware, and
the addresses are not available at the level this is happening, this
code is generic for all interface types.

So what we can do, in my order of preference :-) :

   1. Modify the IPMI firmware to set the device id to a unique number
  for every BMC in the system.  It would be really nice if this was
  done in a way that the device ids could be correlated with
  physical systems.  This will work with the IPMI driver as-is, and
  I checked and udev translations can be done as-is, too, I believe.
   2. Use some OEM IPMI command that could query the physical system
  number, if something like this exists.
   3. Create an OEM handler to use the GUID to map to physical systems. 
  I'm going to need some help with this, I have no idea how to do
  this.  Looking at the GUID format (Table 20-10 in the IPMI 2.0
  spec), I don't see any way to do this.  The node field, BTW, is
  supposed to be the 802.x MAC address.
   4. Use the I/O address.  This introduces a lot of headaches into the
  structure of the IPMI driver as the address has to be propagated
  from the interface-specific handler to the generic code, and it
  introduces an OEM handler.  And I'll need some way to map the I/O
  addresses to physical systems.

Any more ideas?

-Corey
> Regarding the ipmi device support currently being fixed at a max of 4,
> the largest multi-node configuration we currently have is 8 so we would
> need to have the table size bumped up to at least 8.  However, for
> future support, it might be useful to increase it even more (12, 16?).
>   
I'll probably just make it a list and get rid of the table so it can be
arbitrary counts.
> Finally, I don't believe dynamic node plugging will generally be an
> issue for my system since the nodes are merged at boot time rather than
> being dynamically added and/or removed.
>   
So the time is not here yet, but I'm sure it's coming someday :)  I can
wait on this one, then, but I decided it would be pretty easy to do
through the hotplug subsystem.

-Corey
> Thanks very much,
>
> Carol Hebert
>
> On Wed, 2006-10-11 at 10:25 -0500, Corey Minyard wrote:
>   
>> Now the driver is doing exactly what it is supposed to do, but now that
>> may not be what we want.  I'm not sure of the configuration of this
>> system, but the information below gives me some clues.  Here's my guess
>> on the system:
>>
>> This is a NUMA system with hot-plug CPU boards.  Each board has an IPMI
>> controller on it.  The BIOS maps the I/O address and SMBIOS tables for
>> the IPMI controller to different I/O locations based upon the slot the
>> board is in.  There are a number of problems beyond this one for a
>>

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-11 Thread Carol Hebert

Hi,

I believe your assessment of my x460 dual-node system configuration is
correct with the exception of maybe changing the word "slot" to "system"
since the nodes are joined by scalability cables rather than being
connected via a common backplane.

Regarding the uniqueness of the Device ID, I think you mentioned in an
earlier email that the spec was a bit contradictory on the topic.  I
took a look at the spec and agree that it is not at all clear whether
the Device ID should be unique for all controllers or only for ones that
support a different set of application commands/OEM fields.  In one
paragraph, it states that:

"Controllers that implement identical sets of applications (sic)
commands can have the same Device ID in a given system. Thus a
'standardized' controller could be produced where multiple instances of
the controller are used in a system, and all have the same Device ID
value.  The controllers would still be differentiable by their
address..."  

and in the *immediately following* paragraph, it states 

"A controller can optionally use the Device ID as an 'instance'
identifier if more than one controller of that kind is used in the
system."   (It then goes on to say that the GUID, however, is the
preferred method of uniquely identifying controllers.)

Sheesh.  :-}  

In checking out the dmidecode data, I verified that the addresses of the
controllers on the multi-node system are unique and available there.  So
both the GUID and the address are unique for the controllers on the
multi-node system whereas the Device ID is not.  Can we use the GUID
(maybe in some more easily digestible form) or the address instead of
the Device ID?   It seems like the only thing that's clear from the spec
is that the Device ID's uniqueness is something we can't count on.

Regarding the ipmi device support currently being fixed at a max of 4,
the largest multi-node configuration we currently have is 8 so we would
need to have the table size bumped up to at least 8.  However, for
future support, it might be useful to increase it even more (12, 16?).

Finally, I don't believe dynamic node plugging will generally be an
issue for my system since the nodes are merged at boot time rather than
being dynamically added and/or removed.

Thanks very much,

Carol Hebert

On Wed, 2006-10-11 at 10:25 -0500, Corey Minyard wrote:
> Now the driver is doing exactly what it is supposed to do, but now that
> may not be what we want.  I'm not sure of the configuration of this
> system, but the information below gives me some clues.  Here's my guess
> on the system:
> 
> This is a NUMA system with hot-plug CPU boards.  Each board has an IPMI
> controller on it.  The BIOS maps the I/O address and SMBIOS tables for
> the IPMI controller to different I/O locations based upon the slot the
> board is in.  There are a number of problems beyond this one for a
> configuration of this nature.  I'll address those later.
> 
> In response to your question, I believe this is exactly what the Device
> ID in IPMI is intended for.  Each board in the system should have a
> unique device id based upon the slot it is in.  Say you have an
> application that monitors the CPU temperature of all the CPUs.  If a
> temperature goes out of range, you want to know which board that CPU is
> on.  And the Device ID can tell you that.  The IPMI device number that
> you suggest using are arbitrary, especially in a hot-plug system where
> devices can come and go dynamically.
> 
> In addition, you would probably want to be able to do udev mappings so
> that the same slots appear as the same device names (slot 1 is
> /dev/ipmi1, slot 2 is /dev/ipmi2, etc.).  The driver needs to be able to
> give udev information about the devices, and the Product ID/Device ID is
> really all it's got.
> 
> Now for the other problems:
> 
>1. The IPMI driver doesn't current support an arbitrary number of
>   devices.  It has a fixed table of four.  I can fix this fairly
>   easily, though.  I wasn't really expecting a system to be designed
>   like this.
>2. The IPMI driver has no way to handle dynamic node plugging.  I
>   don't know of a standard way to tell the IPMI driver: "Hey, you
>   have a new controller here".  The driver should support adding new
>   devices dynamically, but I need some way to know the device is
>   there, or that it is going away.
>3. I don't think the IPMI driver provides a way for sysfs to report
>   the information that udev needs to do the udev mappings properly 
>   As always with sysfs, this is probably easy once you spend 2 days
>   figuring out what to do.
> 
> Am I on the right track here?



-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-11 Thread Corey Minyard
Now the driver is doing exactly what it is supposed to do, but now that
may not be what we want.  I'm not sure of the configuration of this
system, but the information below gives me some clues.  Here's my guess
on the system:

This is a NUMA system with hot-plug CPU boards.  Each board has an IPMI
controller on it.  The BIOS maps the I/O address and SMBIOS tables for
the IPMI controller to different I/O locations based upon the slot the
board is in.  There are a number of problems beyond this one for a
configuration of this nature.  I'll address those later.

In response to your question, I believe this is exactly what the Device
ID in IPMI is intended for.  Each board in the system should have a
unique device id based upon the slot it is in.  Say you have an
application that monitors the CPU temperature of all the CPUs.  If a
temperature goes out of range, you want to know which board that CPU is
on.  And the Device ID can tell you that.  The IPMI device number that
you suggest using are arbitrary, especially in a hot-plug system where
devices can come and go dynamically.

In addition, you would probably want to be able to do udev mappings so
that the same slots appear as the same device names (slot 1 is
/dev/ipmi1, slot 2 is /dev/ipmi2, etc.).  The driver needs to be able to
give udev information about the devices, and the Product ID/Device ID is
really all it's got.

Now for the other problems:

   1. The IPMI driver doesn't current support an arbitrary number of
  devices.  It has a fixed table of four.  I can fix this fairly
  easily, though.  I wasn't really expecting a system to be designed
  like this.
   2. The IPMI driver has no way to handle dynamic node plugging.  I
  don't know of a standard way to tell the IPMI driver: "Hey, you
  have a new controller here".  The driver should support adding new
  devices dynamically, but I need some way to know the device is
  there, or that it is going away.
   3. I don't think the IPMI driver provides a way for sysfs to report
  the information that udev needs to do the udev mappings properly 
  As always with sysfs, this is probably easy once you spend 2 days
  figuring out what to do.

Am I on the right track here?

-Corey

Carol Hebert wrote:
> Hi Corey,
>
> I'm still having problems with the new patches due to the device ID and
> the Product ID being the same on each of the nodes (still have
> segfault/oops).  The dual node system is really two separate nodes that
> are joined at will (via RSA setup).  Since each began life (and can
> resume life at any time) as a standalone system, isn't it reasonable
> that they could have the same BMC Product and Device IDs?  If not, do
> you think this is something that could/should be changed/set in the BIOS
> for each BMC on multi-node systems?
>
> Alternately, would it be possible to differentiate between the two BMCs
> for sysfs file naming purposes by using the value of intf->intf_num in
> ipmi_bmc_register()?  I believe that's pretty similar to what's
> currently done to differentiate between the ipmi.0 and ipmi.1
> interfaces.  As an example, I tacked the intf_num onto the product id in
> ipmi_bmc_register() (your and Jeff's patched version of the
> ipmi_msghandler.c file):
>
> } else {
> -   char name[14];
> +   char name[16];
> snprintf(name, sizeof(name),
> -  "ipmi_bmc.%4.4x", bmc->id.product_id);
> +  "ipmi_bmc.%4.4x%d", bmc->id.product_id,
> intf->intf_num);
>
> and the modules loaded fine.  The file names become:  ipmi_bmc.00070.32
> and ipmi_bmc.00071.32 (see debug trace below).  I suspect I may be
> grossly oversimplifying the feasibility/usability/implementation of this
> solution but at first glance/touch test, it appears to work so I thought
> it might be good to discuss it.
>
> Anyway, thanks again for your help.  Please let me know what you'd like
> me to try next.  Also, I can probably get some time on a 4-node and/or
> an 8-node system so we can really stress the solution once we've settled
> on a fix.
>
> Thanks much,
>
> Carol Hebert  
>
> -
>
> kobject ipmi_msghandler: registering. parent: , set: module
> kobject_uevent
> fill_kobj_path: path = '/module/ipmi_msghandler'
> kobject ipmi: registering. parent: , set: drivers
> kobject_uevent
> fill_kobj_path: path = '/bus/platform/drivers/ipmi'
> ipmi message handler version 39.0
> kobject ipmi_devintf: registering. parent: , set: module
> kobject_uevent
> fill_kobj_path: path = '/module/ipmi_devintf'
> ipmi device interface
> subsystem ipmi: registering
> kobject ipmi: registering. parent: , set: class
> kobject ipmi_si: registering. parent: , set: module
> kobject_uevent
> fill_kobj_path: path = '/module/ipmi_si'
> kobject ipmi_si: registering. parent: , set: drivers
> kobject_uevent
> fill_kobj_path: path = '/bus/platform/drivers/ipmi_si'
> IPMI System Interface driver.
> ipmi_si: Trying SMBIOS-specified KCS 

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-10 Thread Carol Hebert
Hi Corey,

I'm still having problems with the new patches due to the device ID and
the Product ID being the same on each of the nodes (still have
segfault/oops).  The dual node system is really two separate nodes that
are joined at will (via RSA setup).  Since each began life (and can
resume life at any time) as a standalone system, isn't it reasonable
that they could have the same BMC Product and Device IDs?  If not, do
you think this is something that could/should be changed/set in the BIOS
for each BMC on multi-node systems?

Alternately, would it be possible to differentiate between the two BMCs
for sysfs file naming purposes by using the value of intf->intf_num in
ipmi_bmc_register()?  I believe that's pretty similar to what's
currently done to differentiate between the ipmi.0 and ipmi.1
interfaces.  As an example, I tacked the intf_num onto the product id in
ipmi_bmc_register() (your and Jeff's patched version of the
ipmi_msghandler.c file):

} else {
-   char name[14];
+   char name[16];
snprintf(name, sizeof(name),
-  "ipmi_bmc.%4.4x", bmc->id.product_id);
+  "ipmi_bmc.%4.4x%d", bmc->id.product_id,
intf->intf_num);

and the modules loaded fine.  The file names become:  ipmi_bmc.00070.32
and ipmi_bmc.00071.32 (see debug trace below).  I suspect I may be
grossly oversimplifying the feasibility/usability/implementation of this
solution but at first glance/touch test, it appears to work so I thought
it might be good to discuss it.

Anyway, thanks again for your help.  Please let me know what you'd like
me to try next.  Also, I can probably get some time on a 4-node and/or
an 8-node system so we can really stress the solution once we've settled
on a fix.

Thanks much,

Carol Hebert  

-

kobject ipmi_msghandler: registering. parent: , set: module
kobject_uevent
fill_kobj_path: path = '/module/ipmi_msghandler'
kobject ipmi: registering. parent: , set: drivers
kobject_uevent
fill_kobj_path: path = '/bus/platform/drivers/ipmi'
ipmi message handler version 39.0
kobject ipmi_devintf: registering. parent: , set: module
kobject_uevent
fill_kobj_path: path = '/module/ipmi_devintf'
ipmi device interface
subsystem ipmi: registering
kobject ipmi: registering. parent: , set: class
kobject ipmi_si: registering. parent: , set: module
kobject_uevent
fill_kobj_path: path = '/module/ipmi_si'
kobject ipmi_si: registering. parent: , set: drivers
kobject_uevent
fill_kobj_path: path = '/bus/platform/drivers/ipmi_si'
IPMI System Interface driver.
ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address
0x90a8, slave address 0x20, irq 0
kobject ipmi_si.0: registering. parent: platform, set: devices
PM: Adding info for platform:ipmi_si.0
kobject_uevent
fill_kobj_path: path = '/devices/platform/ipmi_si.0'
CAH: ipmi: NEW BMC: name =  ipmi_bmc.00070;  intf_num = 0
kobject ipmi_bmc.00070.32: registering. parent: platform, set: devices
PM: Adding info for platform:ipmi_bmc.00070.32
kobject_uevent
fill_kobj_path: path = '/devices/platform/ipmi_bmc.00070.32'
ipmi: Found new BMC (man_id: 0x02,  prod_id: 0x0007, dev_id: 0x20)
kobject ipmi0: registering. parent: ipmi, set: class_obj
kobject_uevent
fill_kobj_path: path = '/class/ipmi/ipmi0'
fill_kobj_path: path = '/devices/platform/ipmi_si.0'
 IPMI KCS interface initialized
ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca8,
slave address 0x20, irq 0
kobject ipmi_si.1: registering. parent: platform, set: devices
PM: Adding info for platform:ipmi_si.1
kobject_uevent
fill_kobj_path: path = '/devices/platform/ipmi_si.1'
CAH: ipmi: NEW BMC: name =  ipmi_bmc.00071;  intf_num = 1
kobject ipmi_bmc.00071.32: registering. parent: platform, set: devices
PM: Adding info for platform:ipmi_bmc.00071.32
kobject_uevent
fill_kobj_path: path = '/devices/platform/ipmi_bmc.00071.32'
ipmi: Found new BMC (man_id: 0x02,  prod_id: 0x0007, dev_id: 0x20)
kobject ipmi1: registering. parent: ipmi, set: class_obj
kobject_uevent
fill_kobj_path: path = '/class/ipmi/ipmi1'
fill_kobj_path: path = '/devices/platform/ipmi_si.1'
 IPMI KCS interface initialized
kobject ipmi_si: registering. parent: , set: drivers
kobject_uevent
fill_kobj_path: path = '/bus/pci/drivers/ipmi_si'


On Tue, 2006-10-10 at 10:49 -0500, Corey Minyard wrote: 
> Sorry, I messed up the error recovery in the previous patch.  This one
> should fix it; I've simulated this and it works fine.  I've also
> included a patch from Jeff Garzik that does some more cleanup.  Jeff's
> patch must be applied first; it is named "ipmi-handle-sysfs-errors.patch".
> 
> I'm still not sure what to do about the naming problem, though.  I am
> assuming you the two devices have different GUIDs, otherwise they would
> should up as the same BMC.  I'd prefer to not use the GUID, as it is
> huge and meaningless to humans and applications.
> 
> I re-read the section in the spec again, and I really believe it is the
> intent that different BMCs on the

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-09 Thread Carol Hebert
Hi Corey,

Thanks very much for the patch.  :-)  I built it and ran it on my system
and it works a bit better than the original but it still has some
problems.  I'm attaching the dmesg output below (with a bit of debug
turned on in it).  

With the patch, the modprobe appears to create one of the two ipmi
device nodes (ipmi0) expected for the dual-node system although modprobe
of ipmi_si appears to hang  Could you please take a look at the error
messages below and see if you can spot the problem?

Thanks much again,

Carol Hebert

-
kobject ipmi_msghandler: registering. parent: , set: module
kobject_uevent
fill_kobj_path: path = '/module/ipmi_msghandler'
kobject ipmi: registering. parent: , set: drivers
kobject_uevent
fill_kobj_path: path = '/bus/platform/drivers/ipmi'
ipmi message handler version 39.0
kobject ipmi_devintf: registering. parent: , set: module
kobject_uevent
fill_kobj_path: path = '/module/ipmi_devintf'
ipmi device interface
subsystem ipmi: registering
kobject ipmi: registering. parent: , set: class
kobject ipmi_si: registering. parent: , set: module
kobject_uevent
fill_kobj_path: path = '/module/ipmi_si'
kobject ipmi_si: registering. parent: , set: drivers
kobject_uevent
fill_kobj_path: path = '/bus/platform/drivers/ipmi_si'
IPMI System Interface driver.
ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address
0x90a8, slave address 0x20, irq 0
kobject ipmi_si.0: registering. parent: platform, set: devices
PM: Adding info for platform:ipmi_si.0
kobject_uevent
fill_kobj_path: path = '/devices/platform/ipmi_si.0'
kobject ipmi_bmc.0007.32: registering. parent: platform, set: devices
PM: Adding info for platform:ipmi_bmc.0007.32
kobject_uevent
fill_kobj_path: path = '/devices/platform/ipmi_bmc.0007.32'
ipmi: Found new BMC (man_id: 0x02,  prod_id: 0x0007, dev_id: 0x20)
kobject ipmi0: registering. parent: ipmi, set: class_obj
kobject_uevent
fill_kobj_path: path = '/class/ipmi/ipmi0'
fill_kobj_path: path = '/devices/platform/ipmi_si.0'
 IPMI KCS interface initialized
ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca8,
slave address 0x20, irq 0
kobject ipmi_si.1: registering. parent: platform, set: devices
PM: Adding info for platform:ipmi_si.1
kobject_uevent
fill_kobj_path: path = '/devices/platform/ipmi_si.1'
kobject ipmi_bmc.0007.32: registering. parent: platform, set: devices
kobject_add failed for ipmi_bmc.0007.32 with -EEXIST, don't try to
register things with the same name in the same directory.
 [] show_trace_log_lvl+0x58/0x16a
 [] show_trace+0xd/0x10
 [] dump_stack+0x19/0x1b
 [] kobject_add+0x186/0x1ac
 [] device_add+0x7a/0x2de
 [] platform_device_add+0xde/0x10e
 [] platform_device_register+0x15/0x18
 [] ipmi_register_smi+0x563/0x987 [ipmi_msghandler]
 [] try_smi_init+0x3ff/0x5a7 [ipmi_si]
 [] init_ipmi_si+0x40f/0x6db [ipmi_si]
 [] sys_init_module+0x16ad/0x1856
 [] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 [] show_trace+0xd/0x10
 [] dump_stack+0x19/0x1b
 [] kobject_add+0x186/0x1ac
 [] device_add+0x7a/0x2de
 [] platform_device_add+0xde/0x10e
 [] platform_device_register+0x15/0x18
 [] ipmi_register_smi+0x563/0x987 [ipmi_msghandler]
 [] try_smi_init+0x3ff/0x5a7 [ipmi_si]
 [] init_ipmi_si+0x40f/0x6db [ipmi_si]
 [] sys_init_module+0x16ad/0x1856
 [] syscall_call+0x7/0xb
kobject ipmi_bmc.0007.32: cleaning up
ipmi_msghandler: Unable to register bmc device: -17
ipmi_si: Unable to register device: error -17
BUG: unable to handle kernel paging request at virtual address 6b6b6c73
 printing eip:
c04ab7f4
*pde = 
Oops:  [#1]
SMP
last sysfs file: /class/drm/card0/dev
Modules linked in: ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U)
radeon(U) drm(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U)
sunrpc(U) ipv6(U) acpi_cpufreq(U) video(U) sbs(U) i2c_ec(U) button(U)
battery(U) asus_acpi(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U)
sg(U) i2c_piix4(U) ide_cd(U) i2c_core(U) aacraid(U) tg3(U) cdrom(U)
serio_raw(U) pcspkr(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U)
aic94xx(U) libsas(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) ext3(U)
jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
CPU:14
EIP:0060:[]Not tainted VLI
EFLAGS: 00010212   (2.6.18-ipmipatch #3)
EIP is at sysfs_remove_link+0x1/0xd
eax: 6b6b6c43   ebx: f54c876c   ecx: c042d7c9   edx: f9781b20
esi: 6b6b6b6b   edi: f54c876c   ebp: f4894e58   esp: f4894e48
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 5643, ti=f4894000 task=f6c5e030 task.ti=f4894000)
Stack: f4894e58 f977fec3 ffef  f4894e6c f9780564 ffef
dfc0db38
   ffef f4894e84 f978ef34 0118f8be 0ca8 0004 
f4894eac
   f978f99e  0004 c302d700 010020ac 0ca8 f9797500
f9797500
Call Trace:
 [] ipmi_bmc_unregister+0x20/0x6e [ipmi_msghandler]
 [] ipmi_unregister_smi+0xf/0xc3 [ipmi_msghandler]
 [] try_smi_init+0x4d5/0x5a7 [ipmi_si]
 [] init_ipmi_si+0x40f

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-08 Thread Corey Minyard
Hopefully the attached patch will fix the problem and clean up the error
handling in this failure case.

-Corey

Carol Hebert wrote:
> Hi Corey,
>
> I believe I may have found a problem with the ipmi driver v39 in the
> 2.6.18 kernel when loaded on multi-node systems (in my particular case,
> an dual-node x460 with two BMCs).  At first glance, it appears the
> problem may be in the sysfs code added last January -- it looks like it
> may not be handling the multiple BMCs correctly.   The result is that
> the ipmi_si module won't load and the ipmi device nodes don't get
> created.
>
> I'm only starting to debug the issue but wanted to let you know what
> I've seen asap in case someone's already spotted this problem but I
> missed seeing a patch and also because I'm not a sysfs expert and I
> don't know what the original intent was for how to present multiple BMCs
> (from multi-node systems) in the sysfs.
>
> I'm pasting the stack backtrace below.  Please let me know if you have
> any suggestions or questions.
>
> Thanks much,
>
> Carol Hebert
>
>
> ipmi message handler version 39.0
> IPMI System Interface driver.
> ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address
> 0x90a8, slave address 0x20, irq 0
> PM: Adding info for platform:ipmi_si.0
> PM: Adding info for platform:ipmi_bmc.32
> ipmi: Found new BMC (man_id: 0x02,  prod_id: 0x0007, dev_id: 0x20)
>  IPMI KCS interface initialized
> ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca8,
> slave address 0x20, irq 0
> PM: Adding info for platform:ipmi_si.1
> kobject_add failed for ipmi_bmc.32 with -EEXIST, don't try to register
> things with the same name in the same directory.
>  [] show_trace_log_lvl+0x58/0x16a
>  [] show_trace+0xd/0x10
>  [] dump_stack+0x19/0x1b
>  [] kobject_add+0x14b/0x171
>  [] device_add+0x7a/0x2de
>  [] platform_device_add+0xde/0x10e
>  [] platform_device_register+0x15/0x18
>  [] ipmi_register_smi+0x538/0x94a [ipmi_msghandler]
>  [] try_smi_init+0x3ff/0x5a7 [ipmi_si]
>  [] init_ipmi_si+0x40f/0x6db [ipmi_si]
>  [] sys_init_module+0x16ad/0x1856
>  [] syscall_call+0x7/0xb
> DWARF2 unwinder stuck at syscall_call+0x7/0xb
> Leftover inexact backtrace:
>  [] show_trace+0xd/0x10
>  [] dump_stack+0x19/0x1b
>  [] kobject_add+0x14b/0x171
>  [] device_add+0x7a/0x2de
>  [] platform_device_add+0xde/0x10e
>  [] platform_device_register+0x15/0x18
>  [] ipmi_register_smi+0x538/0x94a [ipmi_msghandler]
>  [] try_smi_init+0x3ff/0x5a7 [ipmi_si]
>  [] init_ipmi_si+0x40f/0x6db [ipmi_si]
>  [] sys_init_module+0x16ad/0x1856
>  [] syscall_call+0x7/0xb
> ipmi_msghandler: Unable to register bmc device: -17
> ipmi_si: Unable to register device: error -17
> BUG: unable to handle kernel paging request at virtual address 6b6b6c73
>  printing eip:
> c04aa1d4
> *pde = 6b6b6b6b
> Oops:  [#1]
> SMP
> last sysfs file: /class/drm/card0/dev
> Modules linked in: ipmi_si ipmi_msghandler radeon drm autofs4 hidp
> rfcomm l2cap bluetooth sunrpc ipv6 acpi_cpufreq video sbs i2c_ec button
> battery asus_acpi ac parport_pc lp parport joydev sg pcspkr tg3 aacraid
> i2c_piix4 i2c_core ide_cd cdrom serio_raw dm_snapshot dm_zero dm_mirror
> dm_mod aic94xx libsas scsi_transport_sas sd_mod scsi_mod ext3 jbd
> ehci_hcd ohci_hcd uhci_hcd
> CPU:8
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010212   (2.6.18-1.2702.el5PAE #1)
> EIP is at sysfs_remove_link+0x1/0xd
> eax: 6b6b6c43   ebx: e722ad78   ecx: c042dc05   edx: f8b0aad8
> esi: 6b6b6b6b   edi: e722ad78   ebp: e7152e58   esp: e7152e48
> ds: 007b   es: 007b   ss: 0068
> Process modprobe (pid: 20599, ti=e7152000 task=f72b0030
> task.ti=e7152000)
> Stack: e7152e58 f8b08ebf ffef  e7152e6c f8b09559 ffef
> eeb70248
>ffef e7152e84 f980bf34 0118c8be 0ca8 0004 
> e7152eac
>f980c99e  0004 d1c2d700 010020ac 0ca8 f9814480
> f9814480
> Call Trace:
>  [] ipmi_bmc_unregister+0x1c/0x63 [ipmi_msghandler]
>  [] ipmi_unregister_smi+0xf/0xc3 [ipmi_msghandler]
>  [] try_smi_init+0x4d5/0x5a7 [ipmi_si]
>  [] init_ipmi_si+0x40f/0x6db [ipmi_si]
>  [] sys_init_module+0x16ad/0x1856
>  [] syscall_call+0x7/0xb
> DWARF2 unwinder stuck at syscall_call+0x7/0xb
> Leftover inexact backtrace:
>  [] show_stack_log_lvl+0x8a/0x95
>  [] show_registers+0x12d/0x19a
>  [] die+0x190/0x293
>  [] do_page_fault+0x4e8/0x5ba
>  [] error_code+0x39/0x40
>  [] ipmi_unregister_smi+0xf/0xc3 [ipmi_msghandler]
>  [] try_smi_init+0x4d5/0x5a7 [ipmi_si]
>  [] init_ipmi_si+0x40f/0x6db [ipmi_si]
>  [] sys_init_module+0x16ad/0x1856
>  [] syscall_call+0x7/0xb
> Code: f1 f8 ff 8b 45 f0 e8 06 d0 03 00 8b 45 ec e8 fe cf 03 00 8b 55 e4
> 8b 4d e0 8b 41 1c 89 54 81 20 83 c4 14 31 c0 5b 5e 5f 5d c3 55 <8b> 40
> 30 89 e5 e8 d0 e4 ff ff 5d c3 55 89 e5 57 56 89 ce 53 83
> EIP: [] sysfs_remove_link+0x1/0xd SS:ESP 0068:e7152e48
>
>   

This patch adds the product id to the driver model platform device
name, in addition to the device id.  The IPMI speci does not require
that individual BMCs in

Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-08 Thread Corey Minyard
The basic problem is that platform_device_alloc() is being called with
the device id, but not the product id as part of the name.  According to
the spec, The combo of the two is required to be unique on a machine. 
But the device id is the same on both BMCs, it appears.

Carol, can you confirm that the product id's are different?  They are
printed at driver load time.

I'll get a patch soon.

-Corey

Yani Ioannou wrote:
> Hi Carol,
>
> On 10/6/06, Carol Hebert <[EMAIL PROTECTED]> wrote:
>   
>> I believe I may have found a problem with the ipmi driver v39 in the
>> 2.6.18 kernel when loaded on multi-node systems (in my particular case,
>> an dual-node x460 with two BMCs).  At first glance, it appears the
>> problem may be in the sysfs code added last January -- it looks like it
>> may not be handling the multiple BMCs correctly.   The result is that
>> the ipmi_si module won't load and the ipmi device nodes don't get
>> created.
>> 
>
> I guess I shouldn't be suprised - its very hard to find someone with
> access to a system with multiple BMCs (not just multiple interfaces)
> to who is willing to test this out with, I only have access to a old
> HP workstation with a rudimentary IPMI 1.0 card myself.
>
>   
>> I'm only starting to debug the issue but wanted to let you know what
>> I've seen asap in case someone's already spotted this problem but I
>> missed seeing a patch and also because I'm not a sysfs expert and I
>> don't know what the original intent was for how to present multiple BMCs
>> (from multi-node systems) in the sysfs.
>> 
>
> I did write the code to handle multiple BMCs, but it looks like I
> overlooked something, from your backtrace at first glance it appears
> that some sysfs file is being duplicated in the same directory. Could
> you perhaps turn on sysfs/kobject debugging in the kernel debugging
> options?
>
> Thanks,
> Yani
>
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> ___
> Openipmi-developer mailing list
> Openipmi-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openipmi-developer
>   


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] ipmi_si appears to be broken on multinode systems in 2.6.18 kernel

2006-10-06 Thread Yani Ioannou
Hi Carol,

On 10/6/06, Carol Hebert <[EMAIL PROTECTED]> wrote:
> I believe I may have found a problem with the ipmi driver v39 in the
> 2.6.18 kernel when loaded on multi-node systems (in my particular case,
> an dual-node x460 with two BMCs).  At first glance, it appears the
> problem may be in the sysfs code added last January -- it looks like it
> may not be handling the multiple BMCs correctly.   The result is that
> the ipmi_si module won't load and the ipmi device nodes don't get
> created.

I guess I shouldn't be suprised - its very hard to find someone with
access to a system with multiple BMCs (not just multiple interfaces)
to who is willing to test this out with, I only have access to a old
HP workstation with a rudimentary IPMI 1.0 card myself.

> I'm only starting to debug the issue but wanted to let you know what
> I've seen asap in case someone's already spotted this problem but I
> missed seeing a patch and also because I'm not a sysfs expert and I
> don't know what the original intent was for how to present multiple BMCs
> (from multi-node systems) in the sysfs.

I did write the code to handle multiple BMCs, but it looks like I
overlooked something, from your backtrace at first glance it appears
that some sysfs file is being duplicated in the same directory. Could
you perhaps turn on sysfs/kobject debugging in the kernel debugging
options?

Thanks,
Yani

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer