From Mike Anderson,
Attn Dear: I write to you based on a request by an investor from Middle-east who wants to invest in any lucrative investment with you and your company or outside your country. He is looking for Joint Partnership who can manage fund based on his investment experience. My name is Mr. Mike Anderson, i am a Financial Consultant here in London; we represent the interests of very wealthy Investors from any part of the world Du to the sensitivity of their position they hold in their Organization and the unstable investment environment of their countries, they prefer to channel/move majority of their funds into more stable economies and developing nations where they can get good yield for their money and its safety. This reserved group of individuals, whom our firm is personally holding their assets instructed and approached us with a mandate to seek for a firm or an individual such as yourself who has the experience and capacity to receive their assets and re-invest it into a good and lucrative investment you will recommend. Kindly let us know your acceptance to this offer. Also you should furnish us with full comprehensive draft of your terms and conditions. We also need to know about the area or idea of investment plans you are to propose. Once we are convinced on your capacity to handle these assets we will then provide you with all necessary information including the unmentioned amount involved. Regards, Mr. Mike Anderson.
Re: Hurry up now
Do You Need A Loan To Pay Up Your Bills Or To Start A Business, RELIEF LOAN COMPANY Is Giving Out Loan Offered At 3% Interest Rate Without Collateral,If Interested,Email us @:mikand...@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hurry up now
Do You Need A Loan To Pay Up Your Bills Or To Start A Business, RELIEF LOAN COMPANY Is Giving Out Loan Offered At 3% Interest Rate Without Collateral,If Interested,Email us @:mikand...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [2.6.23 PATCH 13/18] dm: netlink
Andrew Morton <[EMAIL PROTECTED]> wrote: > On Thu, 12 Jul 2007 18:19:20 -0500 > Matt Mackall <[EMAIL PROTECTED]> wrote: > > > On Wed, Jul 11, 2007 at 02:27:12PM -0700, Andrew Morton wrote: > > > On Wed, 11 Jul 2007 22:01:37 +0100 > > > Alasdair G Kergon <[EMAIL PROTECTED]> wrote: > > > > > > > From: Mike Anderson <[EMAIL PROTECTED]> > > > > > > > > This patch adds a dm-netlink skeleton support to the Makefile, and the > > > > dm > > > > directory. > > > > > > > > ... > > > > > > > > +config DM_NETLINK > > > > + bool "DM netlink events (EXPERIMENTAL)" > > > > + depends on BLK_DEV_DM && EXPERIMENTAL > > > > + ---help--- > > > > + Generate netlink events for DM events. > > > > > > Need a dependency on NET there? > > > > It's really sad to make DM dependent on the network layer. > > > > Yes, it would be somewhat sad. However one can presumably continue to use > DM, just without "DM netlink events". > Yes, if you deselect you just do not receive the events through netlink. > But that probably means that one will not be able to use the standard DM > admin tools without networking. Maybe there will remain alternative but > cruder ways to get things done? No, all admin tools and interfaces function as they do today. The dm-netlink patch series only contains 9 deletions (actual just one true deletion of existing kernel code the others are due to break up of the patch into compilable chunks). The intent was not to break users or force migration. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 13/18] dm: netlink
Matt Mackall <[EMAIL PROTECTED]> wrote: > On Wed, Jul 11, 2007 at 02:27:12PM -0700, Andrew Morton wrote: > > On Wed, 11 Jul 2007 22:01:37 +0100 > > Alasdair G Kergon <[EMAIL PROTECTED]> wrote: > > > > > From: Mike Anderson <[EMAIL PROTECTED]> > > > > > > This patch adds a dm-netlink skeleton support to the Makefile, and the dm > > > directory. > > > > > > ... > > > > > > +config DM_NETLINK > > > + bool "DM netlink events (EXPERIMENTAL)" > > > + depends on BLK_DEV_DM && EXPERIMENTAL > > > + ---help--- > > > + Generate netlink events for DM events. > > > > Need a dependency on NET there? > > It's really sad to make DM dependent on the network layer. It wouldn't be all of dm. Only the netlink based event interface would be dependent. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 14/18] dm: netlink add to core
Eric W. Biederman <[EMAIL PROTECTED]> wrote: > As for worry about kmallocs do these events happen often? The worst case would most likely be in a dm multipath configuration where you could get a burst of N number events (N being equal to the number of luns times the number of paths that are having an issue). > I would > not expect any of them in the normal course of operation for a system. Yes, the ones that are part of this patch are unexpected events or recovery of the unexpected event. > Worst case you handle extra kmallocs with a library function. > It's not like you are using GFP_ATOMIC. I was using GFP_ATOMIC as I did not want __GFP_IO as in some testing there was a case where heavy file system IO was going on and an injected error event caused the swap device into a temporary queued condition while an event was trying to be sent. I may need to go back and investigate this case on recent kernels as it has been a while since I did the test case. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 14/18] dm: netlink add to core
Eric W. Biederman <[EMAIL PROTECTED]> wrote: > I may be a little off but looking at the events types defined. > device down, device up. Defining a completely new interface for this > looks absolutely absurd. > > > This is device hotplug isn't it? As such we should be using the > hotplug infrastructure and not reinventing the wheel here. > I assume device hotplug means kobject_uevent and KOBJ_* events. The original intent was to have a little more structure in the data format the env strings. I also wanted to reduce the number of allocations that where happening with GFP_KERNEL to send an event. Currently the patch is only supporting a couple of events with the intent of adding more over time. I see that I could map most events to KOBJ_CHANGE, previously it did not seem like the correct fit. > If it isn't hotplug it looks like something that inotify should > handle. > > If that isn't the case I am fairly certain that md already has a > mechanism to handle this, and those two should stay in sync > if at all possible on this kind of thing. > Device mapper does have a "event happened" interface today, but post the event the user must determine the context of the event (dm also sends a kobject_uevent KOBJ_CHANGE only for a resume event). This patch was only effecting dm, but I know the md has similar infrastructure. This patch was passing out the event context through netlink that already existed but was lost through the current generic event interface. The existing event interfaces was left in place to not effect existing users allowing migration over to a netlink interface over time. > So this appears to be a gratuitous user interface addition. > Why do we need a new user interface for this? While I understand Evgeniy and Davids comment about utilizing the genetlink interface, I guess I am not seeing that utilizing a netlink channel for a subsystem as a gratuitous user interface addition vs. running everything through kobject_uevent. Thanks, -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 13/18] dm: netlink
David Miller <[EMAIL PROTECTED]> wrote: > From: Evgeniy Polyakov <[EMAIL PROTECTED]> > Date: Thu, 12 Jul 2007 12:10:29 +0400 > > > On Wed, Jul 11, 2007 at 04:37:36PM -0700, Mike Anderson ([EMAIL PROTECTED]) > > wrote: > > > > > --- linux.orig/include/linux/netlink.h2007-07-11 > > > > > 21:37:31.0 +0100 > > > > > +++ linux/include/linux/netlink.h 2007-07-11 21:37:50.0 > > > > > +0100 > > > > > @@ -21,7 +21,7 @@ > > > > > #define NETLINK_DNRTMSG 14 /* DECnet routing > > > > > messages */ > > > > > #define NETLINK_KOBJECT_UEVENT 15 /* Kernel messages to > > > > > userspace */ > > > > > #define NETLINK_GENERIC 16 > > > > > -/* leave room for NETLINK_DM (DM Events) */ > > > > > +#define NETLINK_DM 17 /* Device Mapper */ > > > > > #define NETLINK_SCSITRANSPORT18 /* SCSI Transports */ > > > > > #define NETLINK_ECRYPTFS 19 > > > > > > > > Have the net guys checked this? > > > > > > No. The support is a derivative of the netlink support in > > > scsi_transport_iscsi.c. > > > > I'm not sure about all net guys, but the first question rised after > > reading this - why do you want special netlink family and do not want to > > use interfaces created on top of - like connector and genetlink? > > I agree, there is really no reason to not at least use > genetlink. ok, I will switch over to using genetlink. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 13/18] dm: netlink
David Miller [EMAIL PROTECTED] wrote: From: Evgeniy Polyakov [EMAIL PROTECTED] Date: Thu, 12 Jul 2007 12:10:29 +0400 On Wed, Jul 11, 2007 at 04:37:36PM -0700, Mike Anderson ([EMAIL PROTECTED]) wrote: --- linux.orig/include/linux/netlink.h2007-07-11 21:37:31.0 +0100 +++ linux/include/linux/netlink.h 2007-07-11 21:37:50.0 +0100 @@ -21,7 +21,7 @@ #define NETLINK_DNRTMSG 14 /* DECnet routing messages */ #define NETLINK_KOBJECT_UEVENT 15 /* Kernel messages to userspace */ #define NETLINK_GENERIC 16 -/* leave room for NETLINK_DM (DM Events) */ +#define NETLINK_DM 17 /* Device Mapper */ #define NETLINK_SCSITRANSPORT18 /* SCSI Transports */ #define NETLINK_ECRYPTFS 19 Have the net guys checked this? No. The support is a derivative of the netlink support in scsi_transport_iscsi.c. I'm not sure about all net guys, but the first question rised after reading this - why do you want special netlink family and do not want to use interfaces created on top of - like connector and genetlink? I agree, there is really no reason to not at least use genetlink. ok, I will switch over to using genetlink. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 14/18] dm: netlink add to core
Eric W. Biederman [EMAIL PROTECTED] wrote: I may be a little off but looking at the events types defined. device down, device up. Defining a completely new interface for this looks absolutely absurd. This is device hotplug isn't it? As such we should be using the hotplug infrastructure and not reinventing the wheel here. I assume device hotplug means kobject_uevent and KOBJ_* events. The original intent was to have a little more structure in the data format the env strings. I also wanted to reduce the number of allocations that where happening with GFP_KERNEL to send an event. Currently the patch is only supporting a couple of events with the intent of adding more over time. I see that I could map most events to KOBJ_CHANGE, previously it did not seem like the correct fit. If it isn't hotplug it looks like something that inotify should handle. If that isn't the case I am fairly certain that md already has a mechanism to handle this, and those two should stay in sync if at all possible on this kind of thing. Device mapper does have a event happened interface today, but post the event the user must determine the context of the event (dm also sends a kobject_uevent KOBJ_CHANGE only for a resume event). This patch was only effecting dm, but I know the md has similar infrastructure. This patch was passing out the event context through netlink that already existed but was lost through the current generic event interface. The existing event interfaces was left in place to not effect existing users allowing migration over to a netlink interface over time. So this appears to be a gratuitous user interface addition. Why do we need a new user interface for this? While I understand Evgeniy and Davids comment about utilizing the genetlink interface, I guess I am not seeing that utilizing a netlink channel for a subsystem as a gratuitous user interface addition vs. running everything through kobject_uevent. Thanks, -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 13/18] dm: netlink
Matt Mackall [EMAIL PROTECTED] wrote: On Wed, Jul 11, 2007 at 02:27:12PM -0700, Andrew Morton wrote: On Wed, 11 Jul 2007 22:01:37 +0100 Alasdair G Kergon [EMAIL PROTECTED] wrote: From: Mike Anderson [EMAIL PROTECTED] This patch adds a dm-netlink skeleton support to the Makefile, and the dm directory. ... +config DM_NETLINK + bool DM netlink events (EXPERIMENTAL) + depends on BLK_DEV_DM EXPERIMENTAL + ---help--- + Generate netlink events for DM events. Need a dependency on NET there? It's really sad to make DM dependent on the network layer. It wouldn't be all of dm. Only the netlink based event interface would be dependent. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 14/18] dm: netlink add to core
Eric W. Biederman [EMAIL PROTECTED] wrote: As for worry about kmallocs do these events happen often? The worst case would most likely be in a dm multipath configuration where you could get a burst of N number events (N being equal to the number of luns times the number of paths that are having an issue). I would not expect any of them in the normal course of operation for a system. Yes, the ones that are part of this patch are unexpected events or recovery of the unexpected event. Worst case you handle extra kmallocs with a library function. It's not like you are using GFP_ATOMIC. I was using GFP_ATOMIC as I did not want __GFP_IO as in some testing there was a case where heavy file system IO was going on and an injected error event caused the swap device into a temporary queued condition while an event was trying to be sent. I may need to go back and investigate this case on recent kernels as it has been a while since I did the test case. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [2.6.23 PATCH 13/18] dm: netlink
Andrew Morton [EMAIL PROTECTED] wrote: On Thu, 12 Jul 2007 18:19:20 -0500 Matt Mackall [EMAIL PROTECTED] wrote: On Wed, Jul 11, 2007 at 02:27:12PM -0700, Andrew Morton wrote: On Wed, 11 Jul 2007 22:01:37 +0100 Alasdair G Kergon [EMAIL PROTECTED] wrote: From: Mike Anderson [EMAIL PROTECTED] This patch adds a dm-netlink skeleton support to the Makefile, and the dm directory. ... +config DM_NETLINK + bool DM netlink events (EXPERIMENTAL) + depends on BLK_DEV_DM EXPERIMENTAL + ---help--- + Generate netlink events for DM events. Need a dependency on NET there? It's really sad to make DM dependent on the network layer. Yes, it would be somewhat sad. However one can presumably continue to use DM, just without DM netlink events. Yes, if you deselect you just do not receive the events through netlink. But that probably means that one will not be able to use the standard DM admin tools without networking. Maybe there will remain alternative but cruder ways to get things done? No, all admin tools and interfaces function as they do today. The dm-netlink patch series only contains 9 deletions (actual just one true deletion of existing kernel code the others are due to break up of the patch into compilable chunks). The intent was not to break users or force migration. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 14/18] dm: netlink add to core
Andrew Morton <[EMAIL PROTECTED]> wrote: > On Wed, 11 Jul 2007 22:01:59 +0100 > Alasdair G Kergon <[EMAIL PROTECTED]> wrote: > > > From: Mike Anderson <[EMAIL PROTECTED]> > > > > This patch adds support for the dm_path_event dm_send_event funtions which > > create and send netlink attribute events. > > > > ... > > > > --- linux.orig/drivers/md/dm-netlink.c 2007-07-11 21:37:50.0 > > +0100 > > +++ linux/drivers/md/dm-netlink.c 2007-07-11 21:37:51.0 +0100 > > @@ -40,6 +40,17 @@ struct dm_event_cache { > > > > static struct dm_event_cache _dme_cache; > > > > +struct dm_event { > > +struct dm_event_cache *cdata; > > +struct mapped_device *md; > > +struct sk_buff *skb; > > +struct list_head elist; > > +}; > > + > > +static struct sock *_dm_netlink_sock; > > +static uint32_t _dm_netlink_daemon_pid; > > +static DEFINE_SPINLOCK(_dm_netlink_pid_lock); > > The usage of this lock makes my head spin a bit. It's a shame it wasn't > documented. > > There's obviously something very significant happening with process IDs in > here. A description of the design would be helpful. Especially for the > containerisation guys who no doubt will need to tear their hair out over it > all ;) > ok, answered below. > > > +static int dm_netlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) > > +{ > > + int r = 0; > > + > > + if (security_netlink_recv(skb, CAP_SYS_ADMIN)) > > + return -EPERM; > > + > > + spin_lock(&_dm_netlink_pid_lock); > > + if (_dm_netlink_daemon_pid) { > > + if (_dm_netlink_daemon_pid != nlh->nlmsg_pid) > > + r = -EBUSY; > > + } else > > + _dm_netlink_daemon_pid = nlh->nlmsg_pid; > > + spin_unlock(&_dm_netlink_pid_lock); > > + > > + return r; > > +} > > This really does need some comments. nfi what it's all trying to do here. > The code is restricting the connection to only one daemon. I added the lock above as in some testing of connect / disconnect cycles with a lot of events I receiving errors. The pid is a hold over from older code. If this will cause issue for other users I can switch to using nlmsg_multicast (genlmsg_multicast depending on the comment if I need to switch to the genl interface) and remove this code all together. > > +static void dm_netlink_rcv(struct sock *sk, int len) > > +{ > > + unsigned qlen = 0; > > stupid gcc. > > > + > > + do > > + netlink_run_queue(sk, , _netlink_rcv_msg); > > + while (qlen); > > + > > +} > > stray blank line there. > ok, removed. > > +static int dm_netlink_rcv_event(struct notifier_block *this, > > + unsigned long event, void *ptr) > > +{ > > + struct netlink_notify *n = ptr; > > + > > + spin_lock(&_dm_netlink_pid_lock); > > + > > + if (event == NETLINK_URELEASE && > > + n->protocol == NETLINK_DM && n->pid && > > + n->pid == _dm_netlink_daemon_pid) > > + _dm_netlink_daemon_pid = 0; > > + > > + spin_unlock(&_dm_netlink_pid_lock); > > + > > + return NOTIFY_DONE; > > +} > > + > > +static struct notifier_block dm_netlink_notifier = { > > + .notifier_call = dm_netlink_rcv_event, > > +}; > > + > > int __init dm_netlink_init(void) > > { > > int r; > > > > + r = netlink_register_notifier(_netlink_notifier); > > + if (r) > > + return r; > > + > > + _dm_netlink_sock = netlink_kernel_create(NETLINK_DM, 0, > > +dm_netlink_rcv, NULL, > > +THIS_MODULE); > > I think we're supposed to use the genetlink APIs here. One for the net > guys to check, please. > ok, I can switch if that is the new recommended method. > > + if (!_dm_netlink_sock) { > > + r = -ENOBUFS; > > + goto notifier_out; > > + } > > r = dme_cache_init(&_dme_cache, DM_EVENT_SKB_SIZE); > > - if (!r) > > - DMINFO("version 1.0.0 loaded"); > > + if (r) > > + goto socket_out; > > + > > + DMINFO("version 1.0.0 loaded"); > > + > > + return 0; > > + > > +socket_out: > > + sock_release(_dm_netlink_sock->sk_socket); > > +notifier_out: > > + netlink_unregiste
Re: [2.6.23 PATCH 13/18] dm: netlink
Andrew Morton <[EMAIL PROTECTED]> wrote: > On Wed, 11 Jul 2007 22:01:37 +0100 > Alasdair G Kergon <[EMAIL PROTECTED]> wrote: > > > From: Mike Anderson <[EMAIL PROTECTED]> > > > > This patch adds a dm-netlink skeleton support to the Makefile, and the dm > > directory. > > > > ... > > > > +config DM_NETLINK > > + bool "DM netlink events (EXPERIMENTAL)" > > + depends on BLK_DEV_DM && EXPERIMENTAL > > + ---help--- > > + Generate netlink events for DM events. > > Need a dependency on NET there? > Yes. > > ... > > > > +#ifdef CONFIG_DM_NETLINK > > + > > +int dm_netlink_init(void); > > +void dm_netlink_exit(void); > > + > > +#else /* CONFIG_DM_NETLINK */ > > + > > +static inline int __init dm_netlink_init(void) > > The __init here isn't needed (doesn't make sense, is missing the required > #include anyway) > > > +{ > > + return 0; > > +} > > +static inline void dm_netlink_exit(void) > > +{ > > +} > > + > > +#endif /* CONFIG_DM_NETLINK */ > > + > > +#endif /* DM_NETLINK_H */ > > Index: linux/drivers/md/dm.c > > === > > --- linux.orig/drivers/md/dm.c 2007-07-11 21:37:47.0 +0100 > > +++ linux/drivers/md/dm.c 2007-07-11 21:37:50.0 +0100 > > @@ -7,6 +7,7 @@ > > > > #include "dm.h" > > #include "dm-bio-list.h" > > +#include "dm-netlink.h" > > > > #include > > #include > > @@ -176,6 +177,7 @@ int (*_inits[])(void) __initdata = { > > dm_linear_init, > > dm_stripe_init, > > dm_interface_init, > > + dm_netlink_init, > > }; > > > > void (*_exits[])(void) = { > > @@ -184,6 +186,7 @@ void (*_exits[])(void) = { > > dm_linear_exit, > > dm_stripe_exit, > > dm_interface_exit, > > + dm_netlink_exit, > > }; > > hm, so if CONFIG_DM_NETLINK=n we end up taking the address of an inlined > function. So the __init above _did_ make sense, in a peculiar way. I > don't know that gcc will actually put that converted-to-non-inline function > into the desired section though. > > There's no way in which those inlined functions will ever get inlined. > Perhaps all this would be better if there was no implementation of > dm_netlink_init() and dm_netlink_exit() if CONFIG_DM_NETLINK=n and you just > whack the requisite ifdefs into these tables here? > ok, I will switch to the ifdef CONFIG_DM_NETLINK in the tables. > > static int __init dm_init(void) > > Index: linux/include/linux/netlink.h > > === > > --- linux.orig/include/linux/netlink.h 2007-07-11 21:37:31.0 > > +0100 > > +++ linux/include/linux/netlink.h 2007-07-11 21:37:50.0 +0100 > > @@ -21,7 +21,7 @@ > > #define NETLINK_DNRTMSG14 /* DECnet routing messages */ > > #define NETLINK_KOBJECT_UEVENT 15 /* Kernel messages to userspace > > */ > > #define NETLINK_GENERIC16 > > -/* leave room for NETLINK_DM (DM Events) */ > > +#define NETLINK_DM 17 /* Device Mapper */ > > #define NETLINK_SCSITRANSPORT 18 /* SCSI Transports */ > > #define NETLINK_ECRYPTFS 19 > > Have the net guys checked this? No. The support is a derivative of the netlink support in scsi_transport_iscsi.c. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 13/18] dm: netlink
Andrew Morton [EMAIL PROTECTED] wrote: On Wed, 11 Jul 2007 22:01:37 +0100 Alasdair G Kergon [EMAIL PROTECTED] wrote: From: Mike Anderson [EMAIL PROTECTED] This patch adds a dm-netlink skeleton support to the Makefile, and the dm directory. ... +config DM_NETLINK + bool DM netlink events (EXPERIMENTAL) + depends on BLK_DEV_DM EXPERIMENTAL + ---help--- + Generate netlink events for DM events. Need a dependency on NET there? Yes. ... +#ifdef CONFIG_DM_NETLINK + +int dm_netlink_init(void); +void dm_netlink_exit(void); + +#else /* CONFIG_DM_NETLINK */ + +static inline int __init dm_netlink_init(void) The __init here isn't needed (doesn't make sense, is missing the required #include anyway) +{ + return 0; +} +static inline void dm_netlink_exit(void) +{ +} + +#endif /* CONFIG_DM_NETLINK */ + +#endif /* DM_NETLINK_H */ Index: linux/drivers/md/dm.c === --- linux.orig/drivers/md/dm.c 2007-07-11 21:37:47.0 +0100 +++ linux/drivers/md/dm.c 2007-07-11 21:37:50.0 +0100 @@ -7,6 +7,7 @@ #include dm.h #include dm-bio-list.h +#include dm-netlink.h #include linux/init.h #include linux/module.h @@ -176,6 +177,7 @@ int (*_inits[])(void) __initdata = { dm_linear_init, dm_stripe_init, dm_interface_init, + dm_netlink_init, }; void (*_exits[])(void) = { @@ -184,6 +186,7 @@ void (*_exits[])(void) = { dm_linear_exit, dm_stripe_exit, dm_interface_exit, + dm_netlink_exit, }; hm, so if CONFIG_DM_NETLINK=n we end up taking the address of an inlined function. So the __init above _did_ make sense, in a peculiar way. I don't know that gcc will actually put that converted-to-non-inline function into the desired section though. There's no way in which those inlined functions will ever get inlined. Perhaps all this would be better if there was no implementation of dm_netlink_init() and dm_netlink_exit() if CONFIG_DM_NETLINK=n and you just whack the requisite ifdefs into these tables here? ok, I will switch to the ifdef CONFIG_DM_NETLINK in the tables. static int __init dm_init(void) Index: linux/include/linux/netlink.h === --- linux.orig/include/linux/netlink.h 2007-07-11 21:37:31.0 +0100 +++ linux/include/linux/netlink.h 2007-07-11 21:37:50.0 +0100 @@ -21,7 +21,7 @@ #define NETLINK_DNRTMSG14 /* DECnet routing messages */ #define NETLINK_KOBJECT_UEVENT 15 /* Kernel messages to userspace */ #define NETLINK_GENERIC16 -/* leave room for NETLINK_DM (DM Events) */ +#define NETLINK_DM 17 /* Device Mapper */ #define NETLINK_SCSITRANSPORT 18 /* SCSI Transports */ #define NETLINK_ECRYPTFS 19 Have the net guys checked this? No. The support is a derivative of the netlink support in scsi_transport_iscsi.c. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.23 PATCH 14/18] dm: netlink add to core
Andrew Morton [EMAIL PROTECTED] wrote: On Wed, 11 Jul 2007 22:01:59 +0100 Alasdair G Kergon [EMAIL PROTECTED] wrote: From: Mike Anderson [EMAIL PROTECTED] This patch adds support for the dm_path_event dm_send_event funtions which create and send netlink attribute events. ... --- linux.orig/drivers/md/dm-netlink.c 2007-07-11 21:37:50.0 +0100 +++ linux/drivers/md/dm-netlink.c 2007-07-11 21:37:51.0 +0100 @@ -40,6 +40,17 @@ struct dm_event_cache { static struct dm_event_cache _dme_cache; +struct dm_event { +struct dm_event_cache *cdata; +struct mapped_device *md; +struct sk_buff *skb; +struct list_head elist; +}; + +static struct sock *_dm_netlink_sock; +static uint32_t _dm_netlink_daemon_pid; +static DEFINE_SPINLOCK(_dm_netlink_pid_lock); The usage of this lock makes my head spin a bit. It's a shame it wasn't documented. There's obviously something very significant happening with process IDs in here. A description of the design would be helpful. Especially for the containerisation guys who no doubt will need to tear their hair out over it all ;) ok, answered below. +static int dm_netlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +{ + int r = 0; + + if (security_netlink_recv(skb, CAP_SYS_ADMIN)) + return -EPERM; + + spin_lock(_dm_netlink_pid_lock); + if (_dm_netlink_daemon_pid) { + if (_dm_netlink_daemon_pid != nlh-nlmsg_pid) + r = -EBUSY; + } else + _dm_netlink_daemon_pid = nlh-nlmsg_pid; + spin_unlock(_dm_netlink_pid_lock); + + return r; +} This really does need some comments. nfi what it's all trying to do here. The code is restricting the connection to only one daemon. I added the lock above as in some testing of connect / disconnect cycles with a lot of events I receiving errors. The pid is a hold over from older code. If this will cause issue for other users I can switch to using nlmsg_multicast (genlmsg_multicast depending on the comment if I need to switch to the genl interface) and remove this code all together. +static void dm_netlink_rcv(struct sock *sk, int len) +{ + unsigned qlen = 0; stupid gcc. + + do + netlink_run_queue(sk, qlen, dm_netlink_rcv_msg); + while (qlen); + +} stray blank line there. ok, removed. +static int dm_netlink_rcv_event(struct notifier_block *this, + unsigned long event, void *ptr) +{ + struct netlink_notify *n = ptr; + + spin_lock(_dm_netlink_pid_lock); + + if (event == NETLINK_URELEASE + n-protocol == NETLINK_DM n-pid + n-pid == _dm_netlink_daemon_pid) + _dm_netlink_daemon_pid = 0; + + spin_unlock(_dm_netlink_pid_lock); + + return NOTIFY_DONE; +} + +static struct notifier_block dm_netlink_notifier = { + .notifier_call = dm_netlink_rcv_event, +}; + int __init dm_netlink_init(void) { int r; + r = netlink_register_notifier(dm_netlink_notifier); + if (r) + return r; + + _dm_netlink_sock = netlink_kernel_create(NETLINK_DM, 0, +dm_netlink_rcv, NULL, +THIS_MODULE); I think we're supposed to use the genetlink APIs here. One for the net guys to check, please. ok, I can switch if that is the new recommended method. + if (!_dm_netlink_sock) { + r = -ENOBUFS; + goto notifier_out; + } r = dme_cache_init(_dme_cache, DM_EVENT_SKB_SIZE); - if (!r) - DMINFO(version 1.0.0 loaded); + if (r) + goto socket_out; + + DMINFO(version 1.0.0 loaded); + + return 0; + +socket_out: + sock_release(_dm_netlink_sock-sk_socket); +notifier_out: + netlink_unregister_notifier(dm_netlink_notifier); + DMERR(%s: dme_cache_init failed: %d, __FUNCTION__, r); return r; } @@ -100,4 +292,6 @@ int __init dm_netlink_init(void) void dm_netlink_exit(void) { dme_cache_destroy(_dme_cache); + sock_release(_dm_netlink_sock-sk_socket); + netlink_unregister_notifier(dm_netlink_notifier); } Index: linux/drivers/md/dm-netlink.h === --- linux.orig/drivers/md/dm-netlink.h 2007-07-11 21:37:50.0 +0100 +++ linux/drivers/md/dm-netlink.h 2007-07-11 21:37:51.0 +0100 @@ -21,19 +21,22 @@ #ifndef DM_NETLINK_H #define DM_NETLINK_H -struct dm_event_cache; +#include linux/dm-netlink-if.h + +struct dm_table; struct mapped_device; -struct dm_event { - struct dm_event_cache *cdata; - struct mapped_device *md; - struct sk_buff *skb; - struct list_head elist; -}; +struct dm_event; + +void dm_event_add(struct mapped_device *md, struct
Re: libata error handling
Luben Tuikov <[EMAIL PROTECTED]> wrote: > On 08/19/05 15:38, Patrick Mansfield wrote: > The eh_timed_out + eh_strategy_handler is actually pretty perfect, > and _complete_, for any application and purpose in recovering a > LU/device/host (in that order ;-) ). > > > The two problems I see with the hook are: > > > > It calls the driver in interrupt context, so the called function can't > > sleep. > > Consider this: When SCSI Core told you that the command timed out, > A) it has already finished, > B) it hasn't already finished. > > In case A, you can return EH_HANDLED. In case B, you return > EH_NOT_HANDLED, and deal with it in the eh_strategy_handler. > (Hint: you can still "finish" it from there.) > But dealing with it in the eh_strategy_handler means that you may be stopping all IO on the host instance as the first lun returns EH_NOT_HANDLED for LUN based canceling. I still think we can do better here for an LLDD that cannot execute a cancel in interrupt context. Having a error handler that works is a plus, I would hope that some factoring would happen over time from the eh_strategy_handler to some transport (or other factor point) error handler. I would think from a testing, support, and block level multipath predictability sharing code would be a good goal. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: libata error handling
Luben Tuikov [EMAIL PROTECTED] wrote: On 08/19/05 15:38, Patrick Mansfield wrote: The eh_timed_out + eh_strategy_handler is actually pretty perfect, and _complete_, for any application and purpose in recovering a LU/device/host (in that order ;-) ). The two problems I see with the hook are: It calls the driver in interrupt context, so the called function can't sleep. Consider this: When SCSI Core told you that the command timed out, A) it has already finished, B) it hasn't already finished. In case A, you can return EH_HANDLED. In case B, you return EH_NOT_HANDLED, and deal with it in the eh_strategy_handler. (Hint: you can still finish it from there.) But dealing with it in the eh_strategy_handler means that you may be stopping all IO on the host instance as the first lun returns EH_NOT_HANDLED for LUN based canceling. I still think we can do better here for an LLDD that cannot execute a cancel in interrupt context. Having a error handler that works is a plus, I would hope that some factoring would happen over time from the eh_strategy_handler to some transport (or other factor point) error handler. I would think from a testing, support, and block level multipath predictability sharing code would be a good goal. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc2-mm2 : oops in dm_mod
Laurent Riffard [EMAIL PROTECTED] wrote: > device-mapper: 4.4.0-ioctl (2005-01-12) initialised: [EMAIL PROTECTED] > Unable to handle kernel NULL pointer dereference at virtual address 0094 > printing eip: > d08612ec > *pde = > Oops: [#1] > last sysfs file: > Modules linked in: dm_mod joydev usbhid uhci_hcd usbcore video hotkey configfs > CPU:0 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010246 (2.6.13-rc2-mm2) > EIP is at suspend_targets+0x8/0x42 [dm_mod] > eax: ebx: cf764340 ecx: edx: > esi: edi: ebp: cf06bec4 esp: cf06beb8 > ds: 007b es: 007b ss: 0068 > Process lvm2 (pid: 1532, threadinfo=cf06a000 task=cfa3e520) > Stack: cf764340 ffea cf06becc d0861330 cf06bf20 d085ff99 >cfa3e520 c0114664 >cfa3e520 c0114664 cf06bf20 d0861e0f cf615aa0 > Call Trace: > [] show_stack+0x76/0x7e > [] show_registers+0xea/0x152 > [] die+0xc2/0x13c > [] do_page_fault+0x394/0x4d4 > [] error_code+0x4f/0x54 > [] dm_table_presuspend_targets+0xa/0xc [dm_mod] > [] dm_suspend+0x79/0x1a3 [dm_mod] > [] do_resume+0xee/0x173 [dm_mod] I am also receiving a similar oops in a call to suspend_targets on bootup. Alasdiar, Based on your previous patch http://marc.theaimsgroup.com/?l=linux-kernel=112112298922766=2 How is suspend_targets suppose to protect against the NULL value now being passed in. or is something else going on here? -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.13-rc2-mm2 : oops in dm_mod
Laurent Riffard [EMAIL PROTECTED] wrote: device-mapper: 4.4.0-ioctl (2005-01-12) initialised: [EMAIL PROTECTED] Unable to handle kernel NULL pointer dereference at virtual address 0094 printing eip: d08612ec *pde = Oops: [#1] last sysfs file: Modules linked in: dm_mod joydev usbhid uhci_hcd usbcore video hotkey configfs CPU:0 EIP:0060:[d08612ec]Not tainted VLI EFLAGS: 00010246 (2.6.13-rc2-mm2) EIP is at suspend_targets+0x8/0x42 [dm_mod] eax: ebx: cf764340 ecx: edx: esi: edi: ebp: cf06bec4 esp: cf06beb8 ds: 007b es: 007b ss: 0068 Process lvm2 (pid: 1532, threadinfo=cf06a000 task=cfa3e520) Stack: cf764340 ffea cf06becc d0861330 cf06bf20 d085ff99 cfa3e520 c0114664 cfa3e520 c0114664 cf06bf20 d0861e0f cf615aa0 Call Trace: [c01038e1] show_stack+0x76/0x7e [c01039ea] show_registers+0xea/0x152 [c0103b8e] die+0xc2/0x13c [c0113348] do_page_fault+0x394/0x4d4 [c0103583] error_code+0x4f/0x54 [d0861330] dm_table_presuspend_targets+0xa/0xc [dm_mod] [d085ff99] dm_suspend+0x79/0x1a3 [dm_mod] [d0862b51] do_resume+0xee/0x173 [dm_mod] I am also receiving a similar oops in a call to suspend_targets on bootup. Alasdiar, Based on your previous patch http://marc.theaimsgroup.com/?l=linux-kernelm=112112298922766w=2 How is suspend_targets suppose to protect against the NULL value now being passed in. or is something else going on here? -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.11 - NMI lockup with CFQ scheduler
Tejun Heo [EMAIL PROTECTED] wrote: > Jens Axboe wrote: > >On Wed, Apr 06 2005, Arjan van de Ven wrote: > > > >>>@@ -324,6 +334,7 @@ > >>> issue_flush_fn *issue_flush_fn; > >>> prepare_flush_fn*prepare_flush_fn; > >>> end_flush_fn*end_flush_fn; > >>>+ release_queue_data_fn *release_queue_data_fn; > >>> > >>> /* > >>>* Auto-unplugging state > >> > >>where does this function method actually get called? > > > > > >I missed the hunk in ll_rw_blk.c, rmk pointed the same thing out not 5 > >minutes ago :-) > > > >The patch would not work anyways, as scsi_sysfs.c clears queuedata > >unconditionally. This is a better work-around, it just makes the queue > >hold a reference to the device as well only killing it when the queue is > >torn down. > > > >Still not super happy with it, but I don't see how to solve the circular > >dependency problem otherwise. > > > > Hello, Jens. > > I've been thinking about it for a while. The problem is that we're > reference counting two different objects to track lifetime of one > entity. This happens in both SCSI upper and mid layers. In the upper > layer, genhd and scsi_disk (or scsi_cd, ...) are ref'ed separately while > they share their destiny together (not really different entity) and in > the middle layer scsi_device and request_queue does the same thing. > Circular dependency is occuring because we separate one entity into two > and reference counting them separately. Two are actually one and > necessarily want each other. (until death aparts. Wow, serious. :-) > > IMHO, what we need to do is consolidate ref counting such that in each > layer only one object is reference counted, and the other object is > freed when the ref counted object is released. The object of choice > would be genhd in upper layer and request_queue in mid layer. All > ref-counting should be updated to only ref those objects. We'll need to > add a release callback to genhd and make request_queue properly > reference counted. > > Conceptually, scsi_disk extends genhd and scsi_device extends > request_queue. So, to go one step further, as what UL represents is > genhd (disk device) and ML request_queue (request-based device), > embedding scsi_disk into genhd and scsi_device into request_queue will > make the architecture clearer. To do this, we'll need something like > alloc_disk_with_udata(int minors, size_t udata_len) and the equivalent > for request_queue. > > I've done this half-way and then doing it without fixing the SCSI > model seemed silly so got into working on the state model. (BTW, the > state model is almost done, I'm about to run tests.) > > What do you think? Jens? Well I think extends is one way to look at the subsystem objects, Couldn't it also be said that these objects from each subsystem have just a relationship (parent / child, etc). As reference counting has been implemented in each subsystem sometimes interfaces that cross subsystem boundaries (had / have) not been converted to use similar life time rules. Well your solution tries to solve the problem by creating a new larger object that contains both of the old objects. Another solution would be to use a consistent lifetime rules and stay with smaller objects. Unless going to large objects helps with allocation fragmentation or we get some other benefit it would seem that these combined structures may sometime in the future limit creation of lighter or flexible objects. It would appear another solution is that when you allocate a resource from another subsystem (i.e. blk_init_queue) that both subsystems participate in the same reference counting model and in the allocation routine you past in your object to be referenced counted by the allocating subsystem. Then when it is time to shutdown you do not free the others subsystems object directly, but use the normal release routines. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.11 - NMI lockup with CFQ scheduler
Tejun Heo [EMAIL PROTECTED] wrote: Jens Axboe wrote: On Wed, Apr 06 2005, Arjan van de Ven wrote: @@ -324,6 +334,7 @@ issue_flush_fn *issue_flush_fn; prepare_flush_fn*prepare_flush_fn; end_flush_fn*end_flush_fn; + release_queue_data_fn *release_queue_data_fn; /* * Auto-unplugging state where does this function method actually get called? I missed the hunk in ll_rw_blk.c, rmk pointed the same thing out not 5 minutes ago :-) The patch would not work anyways, as scsi_sysfs.c clears queuedata unconditionally. This is a better work-around, it just makes the queue hold a reference to the device as well only killing it when the queue is torn down. Still not super happy with it, but I don't see how to solve the circular dependency problem otherwise. Hello, Jens. I've been thinking about it for a while. The problem is that we're reference counting two different objects to track lifetime of one entity. This happens in both SCSI upper and mid layers. In the upper layer, genhd and scsi_disk (or scsi_cd, ...) are ref'ed separately while they share their destiny together (not really different entity) and in the middle layer scsi_device and request_queue does the same thing. Circular dependency is occuring because we separate one entity into two and reference counting them separately. Two are actually one and necessarily want each other. (until death aparts. Wow, serious. :-) IMHO, what we need to do is consolidate ref counting such that in each layer only one object is reference counted, and the other object is freed when the ref counted object is released. The object of choice would be genhd in upper layer and request_queue in mid layer. All ref-counting should be updated to only ref those objects. We'll need to add a release callback to genhd and make request_queue properly reference counted. Conceptually, scsi_disk extends genhd and scsi_device extends request_queue. So, to go one step further, as what UL represents is genhd (disk device) and ML request_queue (request-based device), embedding scsi_disk into genhd and scsi_device into request_queue will make the architecture clearer. To do this, we'll need something like alloc_disk_with_udata(int minors, size_t udata_len) and the equivalent for request_queue. I've done this half-way and then doing it without fixing the SCSI model seemed silly so got into working on the state model. (BTW, the state model is almost done, I'm about to run tests.) What do you think? Jens? Well I think extends is one way to look at the subsystem objects, Couldn't it also be said that these objects from each subsystem have just a relationship (parent / child, etc). As reference counting has been implemented in each subsystem sometimes interfaces that cross subsystem boundaries (had / have) not been converted to use similar life time rules. Well your solution tries to solve the problem by creating a new larger object that contains both of the old objects. Another solution would be to use a consistent lifetime rules and stay with smaller objects. Unless going to large objects helps with allocation fragmentation or we get some other benefit it would seem that these combined structures may sometime in the future limit creation of lighter or flexible objects. It would appear another solution is that when you allocate a resource from another subsystem (i.e. blk_init_queue) that both subsystems participate in the same reference counting model and in the allocation routine you past in your object to be referenced counted by the allocating subsystem. Then when it is time to shutdown you do not free the others subsystems object directly, but use the normal release routines. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
[EMAIL PROTECTED] [[EMAIL PROTECTED]] wrote: > In principle the kernel could just number the devices it sees 1,2,... > and export information about them, so that user space can choose > the right number. > The part about exporting information is good. User space needs to > be able to ask if a certain beast is a CD reader, and if so what > manufacturer and model. > But the part about numbering 1,2,... may not be good enough, e.g. > because it does not survive reboots. If we remain Unix-like and use > device nodes in user space to pair a file name with a number, then > it would be very nice if the number encoded the device path uniquely. > Many programs expect this. > It cannot be done in all cases, but a good approximation is obtained > if the number is a hash of the device path. In so far the hash is > collision free we obtain numbers that stay unique over a reboot. > > Andries I disagree that the kernel should apply sequence numbers as devices are found or hash path information into the device name. I am unclear of the need for the hashing the path into the name. In the ptx operating system I previously worked on we ID'd everything that we could get a UUID from and one's that we could not we generated a pseudo one. We also split the UUID space up on config type. This is similar to the discussion in Andreas's mail. Non-id'd devices could possibly slip, but ID'd ones did not. In user space we allowed the user to select any name for a device (the user space to kernel connection was made by UUID. The solution worked on SCSI and FC based devices (Linux obviously deals with many more device name spaces). I thought with devfs, devreg, and non allocated major, minors. A similar capability would be possible. The "/dev" usage would not need to know the path, but methods would be available to make the relationship when needed. -Mike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
[EMAIL PROTECTED] [[EMAIL PROTECTED]] wrote: In principle the kernel could just number the devices it sees 1,2,... and export information about them, so that user space can choose the right number. The part about exporting information is good. User space needs to be able to ask if a certain beast is a CD reader, and if so what manufacturer and model. But the part about numbering 1,2,... may not be good enough, e.g. because it does not survive reboots. If we remain Unix-like and use device nodes in user space to pair a file name with a number, then it would be very nice if the number encoded the device path uniquely. Many programs expect this. It cannot be done in all cases, but a good approximation is obtained if the number is a hash of the device path. In so far the hash is collision free we obtain numbers that stay unique over a reboot. Andries I disagree that the kernel should apply sequence numbers as devices are found or hash path information into the device name. I am unclear of the need for the hashing the path into the name. In the ptx operating system I previously worked on we ID'd everything that we could get a UUID from and one's that we could not we generated a pseudo one. We also split the UUID space up on config type. This is similar to the discussion in Andreas's mail. Non-id'd devices could possibly slip, but ID'd ones did not. In user space we allowed the user to select any name for a device (the user space to kernel connection was made by UUID. The solution worked on SCSI and FC based devices (Linux obviously deals with many more device name spaces). I thought with devfs, devreg, and non allocated major, minors. A similar capability would be possible. The /dev usage would not need to know the path, but methods would be available to make the relationship when needed. -Mike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Cluster using shared scsi
Doug, I guess I worded my question poorly. My question was around multi-path devices in combination with SCSI-2 reserve vs SCSI-3 persistent reserve which has not always been easy, but is more difficult is you use a name space that can slip or can have multiple entries for the same physical device you want to reserve. But here is a second try. If this is a failover cluster then node A will need to reserve all disks in shareable space using sg or only a subset if node A has sync'd his sd name space with the other node and they both wish to do work in disjoint pools of disks. In the scenario of grabbing all the disks. If sda and sdb are the same device than I can only reserve one of them and ensure IO only goes down through the one I reserver-ed otherwise I could get a reservation conflict. This goes along with your previous patch on supporting multi-path at "md" and translating this into the proper device to reserve. I guess it is up to the caller of your service to handle this case correct?? If this not any clearer than my last mail I will just wait to see the code :-). Thanks, -Mike Doug Ledford [[EMAIL PROTECTED]] wrote: > > > > To: Mike Anderson <[EMAIL PROTECTED]> > cc: [EMAIL PROTECTED], James Bottomley > <[EMAIL PROTECTED]>, "Roets, Chris" > <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > [EMAIL PROTECTED] > > > > > > Mike Anderson wrote: > > > > Doug, > > > > A question on clarification. > > > > Is the configuration you are testing have both FC adapters going to the > same > > port of the storage device (mutli-path) or to different ports of the > storage > > device (mulit-port)? > > > > The reason I ask is that I thought if you are using SCSI-2 reserves that > the > > reserve was on a per initiator basis. How does one know which path has > the > > reserve? > > Reservations are global in nature in that a reservation with a device will > block access to that device from all other initiators, including across > different ports on multiport devices (or else they are broken and need a > firmware update). > > > On a side note. I thought the GFS project had up leveled there locking / > fencing > > into a API called a locking harness to support different kinds of fencing > > methods. Any thoughts if this capability could be plugged into this > service so > > that users could reduce recoding depending on which fencing support they > > selected. > > I wouldn't know about that. > > -- > > Doug Ledford <[EMAIL PROTECTED]> http://people.redhat.com/dledford > Please check my web site for aic7xxx updates/answers before > e-mailing me about problems -- Michael Anderson [EMAIL PROTECTED] IBM Linux Technology Center - Storage IO Phone (503) 578-4466 Tie Line: 775-4466 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Cluster using shared scsi
Doug, A question on clarification. Is the configuration you are testing have both FC adapters going to the same port of the storage device (mutli-path) or to different ports of the storage device (mulit-port)? The reason I ask is that I thought if you are using SCSI-2 reserves that the reserve was on a per initiator basis. How does one know which path has the reserve? On a side note. I thought the GFS project had up leveled there locking / fencing into a API called a locking harness to support different kinds of fencing methods. Any thoughts if this capability could be plugged into this service so that users could reduce recoding depending on which fencing support they selected. Thanks, -Mike Doug Ledford [[EMAIL PROTECTED]] wrote: > > > > To: [EMAIL PROTECTED] > cc: James Bottomley <[EMAIL PROTECTED]>, "Roets, Chris" > <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > [EMAIL PROTECTED] > > > > > > "Eric Z. Ayers" wrote: > > > > Doug Ledford writes: > > (James Bottomley commented about the need for SCSI reservation kernel > patches) > > > > > > I agree. It's something that needs fixed in general, your software > needs it > > > as well, and I've written (about 80% done at this point) some open > source > > > software geared towards getting/holding reservations that also > requires the > > > same kernel patches (plus one more to be fully functional, an ioctl to > allow a > > > SCSI reservation to do a forced reboot of a machine). I'll be > releasing that > > > package in the short term (once I get back from my vacation anyway). > > > > > > > Hello Doug, > > > > Does this package also tell the kernel to "re-establish" a > > reservation for all devices after a bus reset, or at least inform a > > user level program? Finding out when there has been a bus reset has > > been a stumbling block for me. > > It doesn't have to. The kernel changes are minimal (basically James' SCSI > reset patch that he's been carrying around, the scsi reservation conflict > patch, and I need to write a third patch that makes the system optionally > reboot immediately on a reservation conflict and which is controlled by an > ioctl, but I haven't done that patch yet). All of the rest is implemented > in > user space via the /dev/sg entries. As such, it doesn't have any more > information about bus resets than you do. However, because of the policy > enacted in the code, it doesn't need to. Furthermore, because there are so > many ways to loose a reservation silently, it's foolhardy to try and keep > reservation consistency any way other than something similar to what I > outline > below. > > The package is meant to be a sort of "scsi reservation" library. The > application that uses the library is responsible for setting policy. I > wrote > a small, simple application that actually does a decent job of implementing > policy on the system. The policy it does implement is simple: > > If told to get a reservation, then attempt to get it. If the attempt is > blocked by an existing reservation and we aren't suppossed to reset the > drive, > then exit. If it's blocked and we are suppossed to reset the drive, then > send > a device reset, then wait 5 seconds, then try to get the reservation. If > we > again fail, then the other machine is still alive (as proven by the fact > that > it re-established its reservation after the reset) and we exit, else we now > have the reservation. > > If told to forcefully get a reservation, then attempt to get it. If the > attempt fails, then reset the device and try again immediately (no 5 second > wait), if it fails again, then exit. > > If told to hold a reservation, then resend your reservation request once > every > 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a > deal as requesting a reservation every 2 seconds might sound). The first > time > the reservation is refused, consider the reservation stolen by another > machine > and exit (or optionally, reboot). > > The package is meant to lock against itself (in other words, a malicious > user > with write access to the /dev/sg entries could confuse this locking > mechanism, > but it will work cooperatively with other copies of itself running on other > machines), the requirements for the locking to be safe are as follows: > > 1) A machine is not allowed to mount or otherwise use a drive in any way > shape or form until it has successfully acquired a reservation. > > 2) Once a machine has a reservation, it is not allowed to ever take any > action to break another machines reservation, so that if the reservation is > stolen, this machine is required to "gracefully" step away from the drive > (rebooting is the best way to accomplish this since even the act of > unmounting > the drive will attempt to write to it). > > 3) The timeouts in the program must be honored (resend your reservation, > when > you hold it, every 2 seconds so that a passive attempt to steal the >
Re: Linux Cluster using shared scsi
Doug, I guess I worded my question poorly. My question was around multi-path devices in combination with SCSI-2 reserve vs SCSI-3 persistent reserve which has not always been easy, but is more difficult is you use a name space that can slip or can have multiple entries for the same physical device you want to reserve. But here is a second try. If this is a failover cluster then node A will need to reserve all disks in shareable space using sg or only a subset if node A has sync'd his sd name space with the other node and they both wish to do work in disjoint pools of disks. In the scenario of grabbing all the disks. If sda and sdb are the same device than I can only reserve one of them and ensure IO only goes down through the one I reserver-ed otherwise I could get a reservation conflict. This goes along with your previous patch on supporting multi-path at md and translating this into the proper device to reserve. I guess it is up to the caller of your service to handle this case correct?? If this not any clearer than my last mail I will just wait to see the code :-). Thanks, -Mike Doug Ledford [[EMAIL PROTECTED]] wrote: To: Mike Anderson [EMAIL PROTECTED] cc: [EMAIL PROTECTED], James Bottomley [EMAIL PROTECTED], Roets, Chris [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Mike Anderson wrote: Doug, A question on clarification. Is the configuration you are testing have both FC adapters going to the same port of the storage device (mutli-path) or to different ports of the storage device (mulit-port)? The reason I ask is that I thought if you are using SCSI-2 reserves that the reserve was on a per initiator basis. How does one know which path has the reserve? Reservations are global in nature in that a reservation with a device will block access to that device from all other initiators, including across different ports on multiport devices (or else they are broken and need a firmware update). On a side note. I thought the GFS project had up leveled there locking / fencing into a API called a locking harness to support different kinds of fencing methods. Any thoughts if this capability could be plugged into this service so that users could reduce recoding depending on which fencing support they selected. I wouldn't know about that. -- Doug Ledford [EMAIL PROTECTED] http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems -- Michael Anderson [EMAIL PROTECTED] IBM Linux Technology Center - Storage IO Phone (503) 578-4466 Tie Line: 775-4466 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Cluster using shared scsi
Doug, A question on clarification. Is the configuration you are testing have both FC adapters going to the same port of the storage device (mutli-path) or to different ports of the storage device (mulit-port)? The reason I ask is that I thought if you are using SCSI-2 reserves that the reserve was on a per initiator basis. How does one know which path has the reserve? On a side note. I thought the GFS project had up leveled there locking / fencing into a API called a locking harness to support different kinds of fencing methods. Any thoughts if this capability could be plugged into this service so that users could reduce recoding depending on which fencing support they selected. Thanks, -Mike Doug Ledford [[EMAIL PROTECTED]] wrote: To: [EMAIL PROTECTED] cc: James Bottomley [EMAIL PROTECTED], Roets, Chris [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Eric Z. Ayers wrote: Doug Ledford writes: (James Bottomley commented about the need for SCSI reservation kernel patches) I agree. It's something that needs fixed in general, your software needs it as well, and I've written (about 80% done at this point) some open source software geared towards getting/holding reservations that also requires the same kernel patches (plus one more to be fully functional, an ioctl to allow a SCSI reservation to do a forced reboot of a machine). I'll be releasing that package in the short term (once I get back from my vacation anyway). Hello Doug, Does this package also tell the kernel to re-establish a reservation for all devices after a bus reset, or at least inform a user level program? Finding out when there has been a bus reset has been a stumbling block for me. It doesn't have to. The kernel changes are minimal (basically James' SCSI reset patch that he's been carrying around, the scsi reservation conflict patch, and I need to write a third patch that makes the system optionally reboot immediately on a reservation conflict and which is controlled by an ioctl, but I haven't done that patch yet). All of the rest is implemented in user space via the /dev/sg entries. As such, it doesn't have any more information about bus resets than you do. However, because of the policy enacted in the code, it doesn't need to. Furthermore, because there are so many ways to loose a reservation silently, it's foolhardy to try and keep reservation consistency any way other than something similar to what I outline below. The package is meant to be a sort of scsi reservation library. The application that uses the library is responsible for setting policy. I wrote a small, simple application that actually does a decent job of implementing policy on the system. The policy it does implement is simple: If told to get a reservation, then attempt to get it. If the attempt is blocked by an existing reservation and we aren't suppossed to reset the drive, then exit. If it's blocked and we are suppossed to reset the drive, then send a device reset, then wait 5 seconds, then try to get the reservation. If we again fail, then the other machine is still alive (as proven by the fact that it re-established its reservation after the reset) and we exit, else we now have the reservation. If told to forcefully get a reservation, then attempt to get it. If the attempt fails, then reset the device and try again immediately (no 5 second wait), if it fails again, then exit. If told to hold a reservation, then resend your reservation request once every 2 seconds (this actually has very minimal CPU/BUS usage and isn't as big a deal as requesting a reservation every 2 seconds might sound). The first time the reservation is refused, consider the reservation stolen by another machine and exit (or optionally, reboot). The package is meant to lock against itself (in other words, a malicious user with write access to the /dev/sg entries could confuse this locking mechanism, but it will work cooperatively with other copies of itself running on other machines), the requirements for the locking to be safe are as follows: 1) A machine is not allowed to mount or otherwise use a drive in any way shape or form until it has successfully acquired a reservation. 2) Once a machine has a reservation, it is not allowed to ever take any action to break another machines reservation, so that if the reservation is stolen, this machine is required to gracefully step away from the drive (rebooting is the best way to accomplish this since even the act of unmounting the drive will attempt to write to it). 3) The timeouts in the program must be honored (resend your reservation, when you hold it, every 2 seconds so that a passive attempt to steal the reservation will see you are still alive within the 5 second timeout and leave you be, which is a sort of heartbeat in and of itself). Anyway,