Re: Distributed storage. Security attributes and ducumentation update.
Hi Pavel. On Mon, Sep 17, 2007 at 06:22:30PM +, Pavel Machek ([EMAIL PROTECTED]) wrote: > > I'm pleased to announce third release of the distributed storage > > subsystem, which allows to form a storage on top of remote and local > > nodes, which in turn can be exported to another storage as a node to > > form tree-like storages. > > How is this different from raid0/1 over nbd? Or raid0/1 over > ata-over-ethernet? I will repeate a quote I made for previous release: It has number of advantages, outlined in the first release and on the project homepage, namely: * non-blocking processing without busy loops (compared to iSCSI and NBD) * small, plugable architecture * failover recovery (reconnect to remote target) * autoconfiguration * no additional allocatins (not including network part) - at least two in device mapper for fast path * very simple - try to compare with iSCSI * works with different network protocols * storage can be formed on top of remote nodes and be exported simultaneously (iSCSI is peer-to-peer only, NBD requires device mapper, is synchronous and wants special userspace thread) DST allows to remove any nodes and then turn it back into the storage without breaking the dataflow, dst core will reconnect automatically to the failed remote nodes, it allows to work with detouched devices just like with usual filesystems (in case it was not formed as a part of linear storage, since in that case meta information is spreaded between nodes). It does not require special processes on behalf of network connection, everything will be performed automatically on behalf of DST core workers, it allows to export new device, created on top of mirror or linear combination of the others, which in turn can be formed on top of another and so on... This was designed to allow to create a distributed storage with completely transparent failover recovery, with ability to detouch remote nodes from mirror array to became standalone realtime backups (or snapshots) and turn it back into the storage without stopping main device node. > > +| DST storate ---| > > storage? Yep, thanks. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Distributed storage. Security attributes and ducumentation update.
Hi! > I'm pleased to announce third release of the distributed storage > subsystem, which allows to form a storage on top of remote and local > nodes, which in turn can be exported to another storage as a node to > form tree-like storages. How is this different from raid0/1 over nbd? Or raid0/1 over ata-over-ethernet? > +| DST storate ---| storage? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Distributed storage. Security attributes and ducumentation update.
Hi! I'm pleased to announce third release of the distributed storage subsystem, which allows to form a storage on top of remote and local nodes, which in turn can be exported to another storage as a node to form tree-like storages. How is this different from raid0/1 over nbd? Or raid0/1 over ata-over-ethernet? +| DST storate ---| storage? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Distributed storage. Security attributes and ducumentation update.
Hi Pavel. On Mon, Sep 17, 2007 at 06:22:30PM +, Pavel Machek ([EMAIL PROTECTED]) wrote: I'm pleased to announce third release of the distributed storage subsystem, which allows to form a storage on top of remote and local nodes, which in turn can be exported to another storage as a node to form tree-like storages. How is this different from raid0/1 over nbd? Or raid0/1 over ata-over-ethernet? I will repeate a quote I made for previous release: It has number of advantages, outlined in the first release and on the project homepage, namely: * non-blocking processing without busy loops (compared to iSCSI and NBD) * small, plugable architecture * failover recovery (reconnect to remote target) * autoconfiguration * no additional allocatins (not including network part) - at least two in device mapper for fast path * very simple - try to compare with iSCSI * works with different network protocols * storage can be formed on top of remote nodes and be exported simultaneously (iSCSI is peer-to-peer only, NBD requires device mapper, is synchronous and wants special userspace thread) DST allows to remove any nodes and then turn it back into the storage without breaking the dataflow, dst core will reconnect automatically to the failed remote nodes, it allows to work with detouched devices just like with usual filesystems (in case it was not formed as a part of linear storage, since in that case meta information is spreaded between nodes). It does not require special processes on behalf of network connection, everything will be performed automatically on behalf of DST core workers, it allows to export new device, created on top of mirror or linear combination of the others, which in turn can be formed on top of another and so on... This was designed to allow to create a distributed storage with completely transparent failover recovery, with ability to detouch remote nodes from mirror array to became standalone realtime backups (or snapshots) and turn it back into the storage without stopping main device node. +| DST storate ---| storage? Yep, thanks. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Distributed storage. Security attributes and ducumentation update.
On Thu, Sep 13, 2007 at 04:22:59PM +0400, Evgeniy Polyakov wrote: > Hi Paul. > > On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL > PROTECTED]) wrote: > > > Further TODO list includes: > > > * implement optional saving of mirroring/linear information on the remote > > > nodes (simple) > > > * implement netlink based setup (simple) > > > * new redundancy algorithm (complex) > > > > > > Homepage: > > > http://tservice.net.ru/~s0mbre/old/?section=projects=dst > > > > A couple questions below, but otherwise looks good from an RCU viewpoint. > > > > Thanx, Paul > > Thanks for your comments, and sorry for late reply I was at KS/London > trip. > > > + if (--num) { > > > + list_for_each_entry_rcu(n, >shared, shared) { > > > > This function is called under rcu_read_lock() or similar, right? > > (Can't tell from this patch.) It is also OK to call it from under the > > update-side mutex, of course. > > Actually not, but it does not require it, since entry can not be removed > during this operations since appropriate reference counter for given node is > being held. It should not be RCU at all. Ah! Yes, it is OK to use _rcu in this case, but should be avoided unless doing so eliminates duplicate code or some such. So, agree with dropping _rcu in this case. > > > +static int dst_mirror_read(struct dst_request *req) > > > +{ > > > + struct dst_node *node = req->node, *n, *min_dist_node; > > > + struct dst_mirror_priv *priv = node->priv; > > > + u64 dist, d; > > > + int err; > > > + > > > + req->bio_endio = _mirror_read_endio; > > > + > > > + do { > > > + err = -ENODEV; > > > + min_dist_node = NULL; > > > + dist = -1ULL; > > > + > > > + /* > > > + * Reading is never performed from the node under resync. > > > + * If this will cause any troubles (like all nodes must be > > > + * resynced between each other), this check can be removed > > > + * and per-chunk dirty bit can be tested instead. > > > + */ > > > + > > > + if (!test_bit(DST_NODE_NOTSYNC, >flags)) { > > > + priv = node->priv; > > > + if (req->start > priv->last_start) > > > + dist = req->start - priv->last_start; > > > + else > > > + dist = priv->last_start - req->start; > > > + min_dist_node = req->node; > > > + } > > > + > > > + list_for_each_entry_rcu(n, >shared, shared) { > > > > I see one call to this function that appears to be under the update-side > > mutex, but I cannot tell if the other calls are safe. (Safe as in either > > under the update-side mutex or under rcu_read_lock() and friends.) > > The same here - those processing function are called from > generic_make_request() from any lock on top of them. Each node is linked > into the list of the first added node, which reference counter is > increased in higher layer. Right now there is no way to add or remove > nodes after array was started, such functionality requires storage tree > lock to be taken and RCU can not be used (since it requires sleeping and > I did not investigate sleepable RCU for this purpose). > > So, essentially RCU is not used in DST :) Works for me! "Use the right tool for the job!" > Thanks for review, Paul. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Distributed storage. Security attributes and ducumentation update.
Hi Paul. On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL PROTECTED]) wrote: > > Further TODO list includes: > > * implement optional saving of mirroring/linear information on the remote > > nodes (simple) > > * implement netlink based setup (simple) > > * new redundancy algorithm (complex) > > > > Homepage: > > http://tservice.net.ru/~s0mbre/old/?section=projects=dst > > A couple questions below, but otherwise looks good from an RCU viewpoint. > > Thanx, Paul Thanks for your comments, and sorry for late reply I was at KS/London trip. > > + if (--num) { > > + list_for_each_entry_rcu(n, >shared, shared) { > > This function is called under rcu_read_lock() or similar, right? > (Can't tell from this patch.) It is also OK to call it from under the > update-side mutex, of course. Actually not, but it does not require it, since entry can not be removed during this operations since appropriate reference counter for given node is being held. It should not be RCU at all. > > +static int dst_mirror_read(struct dst_request *req) > > +{ > > + struct dst_node *node = req->node, *n, *min_dist_node; > > + struct dst_mirror_priv *priv = node->priv; > > + u64 dist, d; > > + int err; > > + > > + req->bio_endio = _mirror_read_endio; > > + > > + do { > > + err = -ENODEV; > > + min_dist_node = NULL; > > + dist = -1ULL; > > + > > + /* > > +* Reading is never performed from the node under resync. > > +* If this will cause any troubles (like all nodes must be > > +* resynced between each other), this check can be removed > > +* and per-chunk dirty bit can be tested instead. > > +*/ > > + > > + if (!test_bit(DST_NODE_NOTSYNC, >flags)) { > > + priv = node->priv; > > + if (req->start > priv->last_start) > > + dist = req->start - priv->last_start; > > + else > > + dist = priv->last_start - req->start; > > + min_dist_node = req->node; > > + } > > + > > + list_for_each_entry_rcu(n, >shared, shared) { > > I see one call to this function that appears to be under the update-side > mutex, but I cannot tell if the other calls are safe. (Safe as in either > under the update-side mutex or under rcu_read_lock() and friends.) The same here - those processing function are called from generic_make_request() from any lock on top of them. Each node is linked into the list of the first added node, which reference counter is increased in higher layer. Right now there is no way to add or remove nodes after array was started, such functionality requires storage tree lock to be taken and RCU can not be used (since it requires sleeping and I did not investigate sleepable RCU for this purpose). So, essentially RCU is not used in DST :) Thanks for review, Paul. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Distributed storage. Security attributes and ducumentation update.
Hi Paul. On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL PROTECTED]) wrote: Further TODO list includes: * implement optional saving of mirroring/linear information on the remote nodes (simple) * implement netlink based setup (simple) * new redundancy algorithm (complex) Homepage: http://tservice.net.ru/~s0mbre/old/?section=projectsitem=dst A couple questions below, but otherwise looks good from an RCU viewpoint. Thanx, Paul Thanks for your comments, and sorry for late reply I was at KS/London trip. + if (--num) { + list_for_each_entry_rcu(n, node-shared, shared) { This function is called under rcu_read_lock() or similar, right? (Can't tell from this patch.) It is also OK to call it from under the update-side mutex, of course. Actually not, but it does not require it, since entry can not be removed during this operations since appropriate reference counter for given node is being held. It should not be RCU at all. +static int dst_mirror_read(struct dst_request *req) +{ + struct dst_node *node = req-node, *n, *min_dist_node; + struct dst_mirror_priv *priv = node-priv; + u64 dist, d; + int err; + + req-bio_endio = dst_mirror_read_endio; + + do { + err = -ENODEV; + min_dist_node = NULL; + dist = -1ULL; + + /* +* Reading is never performed from the node under resync. +* If this will cause any troubles (like all nodes must be +* resynced between each other), this check can be removed +* and per-chunk dirty bit can be tested instead. +*/ + + if (!test_bit(DST_NODE_NOTSYNC, node-flags)) { + priv = node-priv; + if (req-start priv-last_start) + dist = req-start - priv-last_start; + else + dist = priv-last_start - req-start; + min_dist_node = req-node; + } + + list_for_each_entry_rcu(n, node-shared, shared) { I see one call to this function that appears to be under the update-side mutex, but I cannot tell if the other calls are safe. (Safe as in either under the update-side mutex or under rcu_read_lock() and friends.) The same here - those processing function are called from generic_make_request() from any lock on top of them. Each node is linked into the list of the first added node, which reference counter is increased in higher layer. Right now there is no way to add or remove nodes after array was started, such functionality requires storage tree lock to be taken and RCU can not be used (since it requires sleeping and I did not investigate sleepable RCU for this purpose). So, essentially RCU is not used in DST :) Thanks for review, Paul. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Distributed storage. Security attributes and ducumentation update.
On Thu, Sep 13, 2007 at 04:22:59PM +0400, Evgeniy Polyakov wrote: Hi Paul. On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL PROTECTED]) wrote: Further TODO list includes: * implement optional saving of mirroring/linear information on the remote nodes (simple) * implement netlink based setup (simple) * new redundancy algorithm (complex) Homepage: http://tservice.net.ru/~s0mbre/old/?section=projectsitem=dst A couple questions below, but otherwise looks good from an RCU viewpoint. Thanx, Paul Thanks for your comments, and sorry for late reply I was at KS/London trip. + if (--num) { + list_for_each_entry_rcu(n, node-shared, shared) { This function is called under rcu_read_lock() or similar, right? (Can't tell from this patch.) It is also OK to call it from under the update-side mutex, of course. Actually not, but it does not require it, since entry can not be removed during this operations since appropriate reference counter for given node is being held. It should not be RCU at all. Ah! Yes, it is OK to use _rcu in this case, but should be avoided unless doing so eliminates duplicate code or some such. So, agree with dropping _rcu in this case. +static int dst_mirror_read(struct dst_request *req) +{ + struct dst_node *node = req-node, *n, *min_dist_node; + struct dst_mirror_priv *priv = node-priv; + u64 dist, d; + int err; + + req-bio_endio = dst_mirror_read_endio; + + do { + err = -ENODEV; + min_dist_node = NULL; + dist = -1ULL; + + /* + * Reading is never performed from the node under resync. + * If this will cause any troubles (like all nodes must be + * resynced between each other), this check can be removed + * and per-chunk dirty bit can be tested instead. + */ + + if (!test_bit(DST_NODE_NOTSYNC, node-flags)) { + priv = node-priv; + if (req-start priv-last_start) + dist = req-start - priv-last_start; + else + dist = priv-last_start - req-start; + min_dist_node = req-node; + } + + list_for_each_entry_rcu(n, node-shared, shared) { I see one call to this function that appears to be under the update-side mutex, but I cannot tell if the other calls are safe. (Safe as in either under the update-side mutex or under rcu_read_lock() and friends.) The same here - those processing function are called from generic_make_request() from any lock on top of them. Each node is linked into the list of the first added node, which reference counter is increased in higher layer. Right now there is no way to add or remove nodes after array was started, such functionality requires storage tree lock to be taken and RCU can not be used (since it requires sleeping and I did not investigate sleepable RCU for this purpose). So, essentially RCU is not used in DST :) Works for me! Use the right tool for the job! Thanks for review, Paul. Thanx, Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/