Re: Distributed storage. Security attributes and ducumentation update.

2007-09-22 Thread Evgeniy Polyakov
Hi Pavel.

On Mon, Sep 17, 2007 at 06:22:30PM +, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
> > I'm pleased to announce third release of the distributed storage
> > subsystem, which allows to form a storage on top of remote and local
> > nodes, which in turn can be exported to another storage as a node to
> > form tree-like storages.
> 
> How is this different from raid0/1 over nbd? Or raid0/1 over
> ata-over-ethernet?

I will repeate a quote I made for previous release:

It has number of advantages, outlined in the first release and on the
project homepage, namely:

* non-blocking processing without busy loops (compared to iSCSI and NBD)
* small, plugable architecture
* failover recovery (reconnect to remote target)
* autoconfiguration
* no additional allocatins (not including network  part) - at least two
in device mapper for fast path
* very simple - try to compare with iSCSI
* works with different network protocols
* storage can be formed on top of remote nodes and be exported
simultaneously (iSCSI is peer-to-peer only, NBD
requires device mapper, is synchronous and wants
special userspace thread)

DST allows to remove any nodes and then turn it
back into the storage without
breaking the dataflow, dst core will
reconnect automatically to the
failed remote nodes, it allows to work
with detouched devices just like
with usual filesystems (in case it was
not formed as a part of linear
storage, since in that case meta
information is spreaded between nodes).

It does not require special processes on
behalf of network connection,
everything will be performed
automatically on behalf of DST core
workers, it allows to export new device,
created on top of mirror or
linear combination of the others, which
in turn can be formed on top of
another and so on...

This was designed to allow to create a
distributed storage with
completely transparent failover
recovery, with ability to detouch remote
nodes from mirror array to became
standalone realtime backups (or
snapshots) and turn it back into the
storage without stopping main
device node. 

> > +| DST storate ---|
> 
> storage?

Yep, thanks.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-22 Thread Pavel Machek
Hi!

> I'm pleased to announce third release of the distributed storage
> subsystem, which allows to form a storage on top of remote and local
> nodes, which in turn can be exported to another storage as a node to
> form tree-like storages.

How is this different from raid0/1 over nbd? Or raid0/1 over
ata-over-ethernet?

> +| DST storate ---|

storage?


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-22 Thread Pavel Machek
Hi!

 I'm pleased to announce third release of the distributed storage
 subsystem, which allows to form a storage on top of remote and local
 nodes, which in turn can be exported to another storage as a node to
 form tree-like storages.

How is this different from raid0/1 over nbd? Or raid0/1 over
ata-over-ethernet?

 +| DST storate ---|

storage?


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-22 Thread Evgeniy Polyakov
Hi Pavel.

On Mon, Sep 17, 2007 at 06:22:30PM +, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
  I'm pleased to announce third release of the distributed storage
  subsystem, which allows to form a storage on top of remote and local
  nodes, which in turn can be exported to another storage as a node to
  form tree-like storages.
 
 How is this different from raid0/1 over nbd? Or raid0/1 over
 ata-over-ethernet?

I will repeate a quote I made for previous release:

It has number of advantages, outlined in the first release and on the
project homepage, namely:

* non-blocking processing without busy loops (compared to iSCSI and NBD)
* small, plugable architecture
* failover recovery (reconnect to remote target)
* autoconfiguration
* no additional allocatins (not including network  part) - at least two
in device mapper for fast path
* very simple - try to compare with iSCSI
* works with different network protocols
* storage can be formed on top of remote nodes and be exported
simultaneously (iSCSI is peer-to-peer only, NBD
requires device mapper, is synchronous and wants
special userspace thread)

DST allows to remove any nodes and then turn it
back into the storage without
breaking the dataflow, dst core will
reconnect automatically to the
failed remote nodes, it allows to work
with detouched devices just like
with usual filesystems (in case it was
not formed as a part of linear
storage, since in that case meta
information is spreaded between nodes).

It does not require special processes on
behalf of network connection,
everything will be performed
automatically on behalf of DST core
workers, it allows to export new device,
created on top of mirror or
linear combination of the others, which
in turn can be formed on top of
another and so on...

This was designed to allow to create a
distributed storage with
completely transparent failover
recovery, with ability to detouch remote
nodes from mirror array to became
standalone realtime backups (or
snapshots) and turn it back into the
storage without stopping main
device node. 

  +| DST storate ---|
 
 storage?

Yep, thanks.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Paul E. McKenney
On Thu, Sep 13, 2007 at 04:22:59PM +0400, Evgeniy Polyakov wrote:
> Hi Paul.
> 
> On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL 
> PROTECTED]) wrote:
> > > Further TODO list includes:
> > > * implement optional saving of mirroring/linear information on the remote
> > >   nodes (simple)
> > > * implement netlink based setup (simple)
> > > * new redundancy algorithm (complex)
> > > 
> > > Homepage:
> > > http://tservice.net.ru/~s0mbre/old/?section=projects=dst
> > 
> > A couple questions below, but otherwise looks good from an RCU viewpoint.
> > 
> > Thanx, Paul
> 
> Thanks for your comments, and sorry for late reply I was at KS/London
> trip.
> > > + if (--num) {
> > > + list_for_each_entry_rcu(n, >shared, shared) {
> > 
> > This function is called under rcu_read_lock() or similar, right?
> > (Can't tell from this patch.)  It is also OK to call it from under the
> > update-side mutex, of course.
> 
> Actually not, but it does not require it, since entry can not be removed
> during this operations since appropriate reference counter for given node is
> being held. It should not be RCU at all.

Ah!  Yes, it is OK to use _rcu in this case, but should be avoided
unless doing so eliminates duplicate code or some such.  So, agree
with dropping _rcu in this case.

> > > +static int dst_mirror_read(struct dst_request *req)
> > > +{
> > > + struct dst_node *node = req->node, *n, *min_dist_node;
> > > + struct dst_mirror_priv *priv = node->priv;
> > > + u64 dist, d;
> > > + int err;
> > > +
> > > + req->bio_endio = _mirror_read_endio;
> > > +
> > > + do {
> > > + err = -ENODEV;
> > > + min_dist_node = NULL;
> > > + dist = -1ULL;
> > > + 
> > > + /*
> > > +  * Reading is never performed from the node under resync.
> > > +  * If this will cause any troubles (like all nodes must be
> > > +  * resynced between each other), this check can be removed
> > > +  * and per-chunk dirty bit can be tested instead.
> > > +  */
> > > +
> > > + if (!test_bit(DST_NODE_NOTSYNC, >flags)) {
> > > + priv = node->priv;
> > > + if (req->start > priv->last_start)
> > > + dist = req->start - priv->last_start;
> > > + else
> > > + dist = priv->last_start - req->start;
> > > + min_dist_node = req->node;
> > > + }
> > > +
> > > + list_for_each_entry_rcu(n, >shared, shared) {
> > 
> > I see one call to this function that appears to be under the update-side
> > mutex, but I cannot tell if the other calls are safe.  (Safe as in either
> > under the update-side mutex or under rcu_read_lock() and friends.)
> 
> The same here - those processing function are called from
> generic_make_request() from any lock on top of them. Each node is linked
> into the list of the first added node, which reference counter is
> increased in higher layer. Right now there is no way to add or remove
> nodes after array was started, such functionality requires storage tree
> lock to be taken and RCU can not be used (since it requires sleeping and
> I did not investigate sleepable RCU for this purpose).
> 
> So, essentially RCU is not used in DST :)

Works for me!  "Use the right tool for the job!"

> Thanks for review, Paul.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Evgeniy Polyakov
Hi Paul.

On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL PROTECTED]) 
wrote:
> > Further TODO list includes:
> > * implement optional saving of mirroring/linear information on the remote
> > nodes (simple)
> > * implement netlink based setup (simple)
> > * new redundancy algorithm (complex)
> > 
> > Homepage:
> > http://tservice.net.ru/~s0mbre/old/?section=projects=dst
> 
> A couple questions below, but otherwise looks good from an RCU viewpoint.
> 
>   Thanx, Paul

Thanks for your comments, and sorry for late reply I was at KS/London
trip.
> > +   if (--num) {
> > +   list_for_each_entry_rcu(n, >shared, shared) {
> 
> This function is called under rcu_read_lock() or similar, right?
> (Can't tell from this patch.)  It is also OK to call it from under the
> update-side mutex, of course.

Actually not, but it does not require it, since entry can not be removed
during this operations since appropriate reference counter for given node is
being held. It should not be RCU at all.

> > +static int dst_mirror_read(struct dst_request *req)
> > +{
> > +   struct dst_node *node = req->node, *n, *min_dist_node;
> > +   struct dst_mirror_priv *priv = node->priv;
> > +   u64 dist, d;
> > +   int err;
> > +
> > +   req->bio_endio = _mirror_read_endio;
> > +
> > +   do {
> > +   err = -ENODEV;
> > +   min_dist_node = NULL;
> > +   dist = -1ULL;
> > + 
> > +   /*
> > +* Reading is never performed from the node under resync.
> > +* If this will cause any troubles (like all nodes must be
> > +* resynced between each other), this check can be removed
> > +* and per-chunk dirty bit can be tested instead.
> > +*/
> > +
> > +   if (!test_bit(DST_NODE_NOTSYNC, >flags)) {
> > +   priv = node->priv;
> > +   if (req->start > priv->last_start)
> > +   dist = req->start - priv->last_start;
> > +   else
> > +   dist = priv->last_start - req->start;
> > +   min_dist_node = req->node;
> > +   }
> > +
> > +   list_for_each_entry_rcu(n, >shared, shared) {
> 
> I see one call to this function that appears to be under the update-side
> mutex, but I cannot tell if the other calls are safe.  (Safe as in either
> under the update-side mutex or under rcu_read_lock() and friends.)

The same here - those processing function are called from
generic_make_request() from any lock on top of them. Each node is linked
into the list of the first added node, which reference counter is
increased in higher layer. Right now there is no way to add or remove
nodes after array was started, such functionality requires storage tree
lock to be taken and RCU can not be used (since it requires sleeping and
I did not investigate sleepable RCU for this purpose).

So, essentially RCU is not used in DST :)

Thanks for review, Paul.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Evgeniy Polyakov
Hi Paul.

On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL PROTECTED]) 
wrote:
  Further TODO list includes:
  * implement optional saving of mirroring/linear information on the remote
  nodes (simple)
  * implement netlink based setup (simple)
  * new redundancy algorithm (complex)
  
  Homepage:
  http://tservice.net.ru/~s0mbre/old/?section=projectsitem=dst
 
 A couple questions below, but otherwise looks good from an RCU viewpoint.
 
   Thanx, Paul

Thanks for your comments, and sorry for late reply I was at KS/London
trip.
  +   if (--num) {
  +   list_for_each_entry_rcu(n, node-shared, shared) {
 
 This function is called under rcu_read_lock() or similar, right?
 (Can't tell from this patch.)  It is also OK to call it from under the
 update-side mutex, of course.

Actually not, but it does not require it, since entry can not be removed
during this operations since appropriate reference counter for given node is
being held. It should not be RCU at all.

  +static int dst_mirror_read(struct dst_request *req)
  +{
  +   struct dst_node *node = req-node, *n, *min_dist_node;
  +   struct dst_mirror_priv *priv = node-priv;
  +   u64 dist, d;
  +   int err;
  +
  +   req-bio_endio = dst_mirror_read_endio;
  +
  +   do {
  +   err = -ENODEV;
  +   min_dist_node = NULL;
  +   dist = -1ULL;
  + 
  +   /*
  +* Reading is never performed from the node under resync.
  +* If this will cause any troubles (like all nodes must be
  +* resynced between each other), this check can be removed
  +* and per-chunk dirty bit can be tested instead.
  +*/
  +
  +   if (!test_bit(DST_NODE_NOTSYNC, node-flags)) {
  +   priv = node-priv;
  +   if (req-start  priv-last_start)
  +   dist = req-start - priv-last_start;
  +   else
  +   dist = priv-last_start - req-start;
  +   min_dist_node = req-node;
  +   }
  +
  +   list_for_each_entry_rcu(n, node-shared, shared) {
 
 I see one call to this function that appears to be under the update-side
 mutex, but I cannot tell if the other calls are safe.  (Safe as in either
 under the update-side mutex or under rcu_read_lock() and friends.)

The same here - those processing function are called from
generic_make_request() from any lock on top of them. Each node is linked
into the list of the first added node, which reference counter is
increased in higher layer. Right now there is no way to add or remove
nodes after array was started, such functionality requires storage tree
lock to be taken and RCU can not be used (since it requires sleeping and
I did not investigate sleepable RCU for this purpose).

So, essentially RCU is not used in DST :)

Thanks for review, Paul.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Paul E. McKenney
On Thu, Sep 13, 2007 at 04:22:59PM +0400, Evgeniy Polyakov wrote:
 Hi Paul.
 
 On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL 
 PROTECTED]) wrote:
   Further TODO list includes:
   * implement optional saving of mirroring/linear information on the remote
 nodes (simple)
   * implement netlink based setup (simple)
   * new redundancy algorithm (complex)
   
   Homepage:
   http://tservice.net.ru/~s0mbre/old/?section=projectsitem=dst
  
  A couple questions below, but otherwise looks good from an RCU viewpoint.
  
  Thanx, Paul
 
 Thanks for your comments, and sorry for late reply I was at KS/London
 trip.
   + if (--num) {
   + list_for_each_entry_rcu(n, node-shared, shared) {
  
  This function is called under rcu_read_lock() or similar, right?
  (Can't tell from this patch.)  It is also OK to call it from under the
  update-side mutex, of course.
 
 Actually not, but it does not require it, since entry can not be removed
 during this operations since appropriate reference counter for given node is
 being held. It should not be RCU at all.

Ah!  Yes, it is OK to use _rcu in this case, but should be avoided
unless doing so eliminates duplicate code or some such.  So, agree
with dropping _rcu in this case.

   +static int dst_mirror_read(struct dst_request *req)
   +{
   + struct dst_node *node = req-node, *n, *min_dist_node;
   + struct dst_mirror_priv *priv = node-priv;
   + u64 dist, d;
   + int err;
   +
   + req-bio_endio = dst_mirror_read_endio;
   +
   + do {
   + err = -ENODEV;
   + min_dist_node = NULL;
   + dist = -1ULL;
   + 
   + /*
   +  * Reading is never performed from the node under resync.
   +  * If this will cause any troubles (like all nodes must be
   +  * resynced between each other), this check can be removed
   +  * and per-chunk dirty bit can be tested instead.
   +  */
   +
   + if (!test_bit(DST_NODE_NOTSYNC, node-flags)) {
   + priv = node-priv;
   + if (req-start  priv-last_start)
   + dist = req-start - priv-last_start;
   + else
   + dist = priv-last_start - req-start;
   + min_dist_node = req-node;
   + }
   +
   + list_for_each_entry_rcu(n, node-shared, shared) {
  
  I see one call to this function that appears to be under the update-side
  mutex, but I cannot tell if the other calls are safe.  (Safe as in either
  under the update-side mutex or under rcu_read_lock() and friends.)
 
 The same here - those processing function are called from
 generic_make_request() from any lock on top of them. Each node is linked
 into the list of the first added node, which reference counter is
 increased in higher layer. Right now there is no way to add or remove
 nodes after array was started, such functionality requires storage tree
 lock to be taken and RCU can not be used (since it requires sleeping and
 I did not investigate sleepable RCU for this purpose).
 
 So, essentially RCU is not used in DST :)

Works for me!  Use the right tool for the job!

 Thanks for review, Paul.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/