Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Tom Tucker


On 11/27/07 7:27 PM, "Rusty Russell" <[EMAIL PROTECTED]> wrote:

> On Tuesday 27 November 2007 16:35:42 Tom Tucker wrote:
>> On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
>> Explicitly documenting what comprises the kernel API (external,
>> supported) and what comprises the kernel implementation (internal, not
>> supported).
> 
> But the former is currently an empty set.
> 

Yes, I overstated this.

>> - making it obvious to developers when they are binding their
>> implementation to a particular kernel release
> 
> See, there's your problem.  All interfaces can, and will, change.  You're
> always binding yourself to a particular release.
> 

Absolutely in the limit. But there are many bits of code that work quite
nicely from release to release because they use services that live in the
smooth water in the wake of the Linux head.

I think defining that smooth water has merit. I also think that it would be
nice to limit the scope of module externs to avoid polluting the global
namespace. I'm not sure that this particular patch reaches these goals, but
it prompted me to comment.

> So you're not proposing we mark what's not stable, you're arguing that we
> create a subset which is stable.
> 

Well, this is an interesting question. The answer is I think both are
important. It would be nice (and arguably necessary long term) to limit the
scope of externs. This can be accomplished with name spaces "I want bob's
implementation of read."

I think it also has value to define interfaces that are considered stable
(but not inviolate) to allow developers to make better informed decisions
when choosing interfaces. Having this info explicit in the code seems
logical to me.

> That's an argument we're not (yet) having.
> 

Yeah, maybe I'm off in the weeds on this one...

Tom

> Cheers,
> Rusty.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-27 Thread Tom Tucker

On Tue, 2007-11-27 at 18:15 +0100, Adrian Bunk wrote:
> On Mon, Nov 26, 2007 at 11:35:42PM -0600, Tom Tucker wrote:
> > On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
> >...
> > > No.  That's the wrong question.  What's the real upside?
> > 
> > Explicitly documenting what comprises the kernel API (external,
> > supported) and what comprises the kernel implementation (internal, not
> > supported).
> >...
> 
> There is not, never was, and never will be, any supported external API 
> of the kernel.

Philosophically I understand what you're saying, but in practical terms
there is the issue of managing core API like kmalloc. Although kmalloc
_could_ change, doing so would be extremely painful. In fact anyone who
proposed such a change would have to have a profoundly powerful argument
as to why it was necessary.

I think this patchset is an attempt to make it easier to identify and
review these kinds of interfaces.

>  
> cu
> Adrian
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-26 Thread Tom Tucker

On Tue, 2007-11-27 at 15:49 +1100, Rusty Russell wrote:
> On Monday 26 November 2007 17:15:44 Roland Dreier wrote:
> >  > Except C doesn't have namespaces and this mechanism doesn't create them.
> >  >  So this is just complete and utter makework; as I said before, noone's
> >  > going to confuse all those udp_* functions if they're not in the udp
> >  > namespace.
> >
> > I don't understand why you're so opposed to organizing the kernel's
> > exported symbols in a more self-documenting way.
> 
> No, I was the one who moved exports near their declarations.  That's 
> organised.  I just don't see how this new "organization" will help: oh good, 
> I won't accidentally use the udp functions any more?!?
> 
> > It seems pretty   
> > clear to me that having a mechanism that requires modules to make
> > explicit which (semi-)internal APIs makes reviewing easier
> 
> Perhaps you've got lots of patches were people are using internal APIs they 
> shouldn't?
> 

Maybe the issue is "who can tell" since what is external and what is
internal is not explicitly defined?

> > , makes it 
> > easier to communicate "please don't use that API" to module authors,
> 
> Well, introduce an EXPORT_SYMBOL_INTERNAL().  It's a lot less code.  But 
> you'd 
> still need to show that people are having trouble knowing what APIs to use.

> > and takes at least a small step towards bringing the kernel's exported
> > API under control.
> 
> There is no "exported API" to bring under control.  

Hmm...apparently, there are those that are struggling...

> There are symbols we 
> expose for the kernel's own use which can be used by external modules at 
> their own risk.  
> 
> > What's the real downside? 
> 
> No.  That's the wrong question.  What's the real upside?

Explicitly documenting what comprises the kernel API (external,
supported) and what comprises the kernel implementation (internal, not
supported).

> 
> Let's not put code in the core because "it doesn't seem to hurt".
> 

agreed.

> I'm sure you think there's a real problem, but I'm still waiting for someone 
> to *show* it to me.  Then we can look at solutions.

I think the benefits should include:

- forcing developers to identify their exports as part of the
implementation or as part of the kernel API

- making it easier for reviewers to identify when developers are adding
to the kernel API and thereby focusing the appropriate level of review
to the new function

- making it obvious to developers when they are binding their
implementation to a particular kernel release



> Rusty.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-16 Thread Tom Tucker
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote:

[...snip...]

> > I think removing the RDMA stack is the wrong thing to do, and you 
> > shouldn't just threaten to yank entire subsystems because you don't like 
> > the technology.  Lets keep this constructive, can we?  RDMA should get 
> > the respect of any other technology in Linux.  Maybe its a niche in your 
> > opinion, but come on, there's more RDMA users than say, the sparc64 
> > port.  Eh?
> 
> It's not about being a niche.  It's about creating a maintainable 
> software net stack that has predictable behavior.

Isn't RDMA _part_ of the "software net stack" within Linux? Why isn't
making RDMA stable, supportable and maintainable equally as important as
any other subsystem? 

> 
> Needing to reach out of the RDMA sandbox and reserve net stack resources 
> away from itself travels a path we've consistently avoided.
> 
> 
> >> I will NACK any patch that opens up sockets to eat up ports or
> >> anything stupid like that.
> > 
> > Got it.
> 
> Ditto for me as well.
> 
>   Jeff
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] amso1100: QP init bug in amso driver

2007-07-24 Thread Tom Tucker
Roland:

The guys at UNH found this and fixed it. I'm surprised no
one has hit this before. I guess it only breaks when the 
refcount on the QP is non-zero.

Initialize the wait_queue_head_t in the c2_qp structure.

Signed-off-by: Ethan Burns <[EMAIL PROTECTED]>
Acked-by: Tom Tucker <[EMAIL PROTECTED]>

---
 drivers/infiniband/hw/amso1100/c2_qp.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c 
b/drivers/infiniband/hw/amso1100/c2_qp.c
index 420c138..01d0786 100644
--- a/drivers/infiniband/hw/amso1100/c2_qp.c
+++ b/drivers/infiniband/hw/amso1100/c2_qp.c
@@ -506,6 +506,7 @@ int c2_alloc_qp(struct c2_dev *c2dev,
qp->send_sgl_depth = qp_attrs->cap.max_send_sge;
qp->rdma_write_sgl_depth = qp_attrs->cap.max_send_sge;
qp->recv_sgl_depth = qp_attrs->cap.max_recv_sge;
+   init_waitqueue_head(&qp->wait);
 
/* Initialize the SQ MQ */
q_size = be32_to_cpu(reply->sq_depth);

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [openib-general] [PATCH 3/9] NetEffect 10Gb RNIC Driver: openfabrics connection manager c file

2006-10-27 Thread Tom Tucker

[...snip...]
> +extern void set_interface(
> +UINT32ip_addr,

These should probably be the standard linux types u32, or uint32

> +UINT32mask,
> +UINT32bcastaddr,
> +UINT32type
> +   );

[...snip...]

> + struct NES_sockaddr_in  inet_addr;
> + struct sockaddr_in  kinet_addr;

Is there some reason why you need your own sockaddr and sockaddr_in
structures? 

[...snip...]
> +
> +/**
> + * nes_disconnect
> + * 
> + * @param cm_id
> + * @param abrupt
> + * 
> + * @return int
> + */
> +int nes_disconnect(struct iw_cm_id *cm_id, int abrupt)
> +{
> + struct ib_qp_attr attr;
> + struct ib_qp *ibqp;
> + struct nes_qp *nesqp;
> + struct nes_dev *nesdev = to_nesdev(cm_id->device);
> + int err = 0;
> + u8 u8temp;
> +
> + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__);
> + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__,
> atomic_read(&nesdev->netdev->refcnt));
> +
> + /* If the qp was already destroyed, then there's no QP */
> + if (cm_id->provider_data == 0)
> + return 0;
> +
> + nesqp = (struct nes_qp *)cm_id->provider_data;
> + ibqp = &nesqp->ibqp;
> +
> + /* Disassociate the QP from this cm_id */
> + cm_id->provider_data = 0;
> + cm_id->rem_ref(cm_id);
> + nesqp->cm_id = 0;
> +
> + stack_ops_p->decelerate_socket(nesqp->socket, 
> +(struct nes_uploaded_qp_context *)
> +nesqp->nesqp_context);
> +  
> + if (nesqp->active_conn) {
> +   u8temp = 1 << (ntohs(cm_id->local_addr.sin_port)&7);
> +   nesdev->apbv_table[ntohs(cm_id->local_addr.sin_port)>>3] &=
> ~(u8temp);
> + } else {
> + dev_put(nesdev->netdev);
> +/* Need to free the Last Streaming Mode Message */
> +pci_free_consistent(nesdev->pcidev, 
> +
> nesqp->private_data_len+sizeof(*nesqp->ietf_frame), 
> +nesqp->ietf_frame,
> +nesqp->ietf_frame_pbase);

This is mailer perversion. You need to turn off wrapping in your mailer.
It makes it hard to review the patch never mind apply it.

> +}
> +
> + if (nesqp->ksock) sock_release(nesqp->ksock);
> + stack_ops_p->sock_ops_p->close( nesqp->socket );
> + nesqp->ksock = 0;
> + nesqp->socket = 0;
> + if (nesqp->wq) {
> + destroy_workqueue(nesqp->wq);

This will deadlock if this function is called from a workqueue thread
and CONFIG_HOTPLUG_CPU is enabled. 

> + nesqp->wq = NULL;
> + }
> +
> + memset(&attr, 0, sizeof(struct ib_qp_attr));
> + if (abrupt)
> + attr.qp_state = IB_QPS_ERR;
> + else
> + attr.qp_state = IB_QPS_SQD;
> +
> + return err;
> +}
> +
> +
> +/**
> + * nes_accept
> + * 
> + * @param cm_id
> + * @param conn_param
> + * 
> + * @return int
> + */
> +int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param
> *conn_param)
> +{
> + struct nes_qp *nesqp;
> + struct nes_dev *nesdev;
> + struct nes_adapter *nesadapter;
> + struct ib_qp *ibqp;
> +struct nes_hw_qp_wqe *wqe;
> + struct nes_v4_quad nes_quad;
> + struct ib_qp_attr attr;
> +struct iw_cm_event cm_event;
> +
> + dprintk("%s:%s:%u: data len = %u\n", 
> + __FILE__, __FUNCTION__, __LINE__,
> conn_param->private_data_len);
> +
> + ibqp = nes_get_qp(cm_id->device, conn_param->qpn);
> + if (!ibqp)
> + return -EINVAL;
> + nesqp = to_nesqp(ibqp);
> + nesdev = to_nesdev(nesqp->ibqp.device);
> + nesadapter = nesdev->nesadapter;
> + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__,
> atomic_read(&nesdev->netdev->refcnt));
> +
> +nesqp->ietf_frame = pci_alloc_consistent(nesdev->pcidev, 
> +
> sizeof(*nesqp->ietf_frame)+conn_param->private_data_len,
> + &nesqp->ietf_frame_pbase);
> +if (!nesqp->ietf_frame) {
> +dprintk(KERN_ERR PFX "%s: Unable to allocate memory for private
> data\n", __FUNCTION__);
> +return -ENOMEM;
> +}
> +dprintk(PFX "%s: PCI consistent memory for "
> +"private data located @ %p (pa = 0x%08lX.) size = %u.\n", 
> +__FUNCTION__, nesqp->ietf_frame, (unsigned
> long)nesqp->ietf_frame_pbase,
> +conn_param->private_data_len+sizeof(*nesqp->ietf_frame));
> +nesqp->private_data_len = conn_param->private_data_len;
> +
> +strcpy(&nesqp->ietf_frame->key[0], IEFT_MPA_KEY_REP);
> +memcpy(&nesqp->ietf_frame->private_data, conn_param->private_data,
> conn_param->private_data_len);
> +nesqp->ietf_frame->private_data_size =
> cpu_to_be16(conn_param->private_data_len);
> +nesqp->ietf_frame->rev = mpa_version;
> +nesqp->ietf_frame->flags = IETF_MPA_FLAGS_CRC;
> +
> +wqe = &nesqp->hwqp.sq_vbase[0];
> +*((struct nes_

[PATCH] Add spinlocks to serialize ib_post_send/ib_post_recv

2006-10-03 Thread Tom Tucker
From: Tom Tucker <[EMAIL PROTECTED]>

The AMSO driver was not thread-safe in the post WR code and had
code that would sleep if the WR post FIFO was full. Since these
functions can be called on interrupt level I changed the sleep to a
udelay.

Signed-off-by: Tom Tucker <[EMAIL PROTECTED]>
---

 drivers/infiniband/hw/amso1100/c2_qp.c |   15 +++
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c 
b/drivers/infiniband/hw/amso1100/c2_qp.c
index 1226113..681c130 100644
--- a/drivers/infiniband/hw/amso1100/c2_qp.c
+++ b/drivers/infiniband/hw/amso1100/c2_qp.c
@@ -35,6 +35,7 @@
  *
  */
 
+#include 
 #include "c2.h"
 #include "c2_vq.h"
 #include "c2_status.h"
@@ -705,10 +706,8 @@ static inline void c2_activity(struct c2
 * cannot get on the bus and the card and system hang in a
 * deadlock -- thus the need for this code. [TOT]
 */
-   while (readl(c2dev->regs + PCI_BAR0_ADAPTER_HINT) & 0x8000) {
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(0);
-   }
+   while (readl(c2dev->regs + PCI_BAR0_ADAPTER_HINT) & 0x8000)
+   udelay(10);
 
__raw_writel(C2_HINT_MAKE(mq_index, shared),
 c2dev->regs + PCI_BAR0_ADAPTER_HINT);
@@ -766,6 +765,7 @@ int c2_post_send(struct ib_qp *ibqp, str
struct c2_dev *c2dev = to_c2dev(ibqp->device);
struct c2_qp *qp = to_c2qp(ibqp);
union c2wr wr;
+   unsigned long lock_flags;
int err = 0;
 
u32 flags;
@@ -881,8 +881,10 @@ int c2_post_send(struct ib_qp *ibqp, str
/*
 * Post the puppy!
 */
+   spin_lock_irqsave(&qp->lock, lock_flags);
err = qp_wr_post(&qp->sq_mq, &wr, qp, msg_size);
if (err) {
+   spin_unlock_irqrestore(&qp->lock, lock_flags);
break;
}
 
@@ -890,6 +892,7 @@ int c2_post_send(struct ib_qp *ibqp, str
 * Enqueue mq index to activity FIFO.
 */
c2_activity(c2dev, qp->sq_mq.index, qp->sq_mq.hint_count);
+   spin_unlock_irqrestore(&qp->lock, lock_flags);
 
ib_wr = ib_wr->next;
}
@@ -905,6 +908,7 @@ int c2_post_receive(struct ib_qp *ibqp, 
struct c2_dev *c2dev = to_c2dev(ibqp->device);
struct c2_qp *qp = to_c2qp(ibqp);
union c2wr wr;
+   unsigned long lock_flags;
int err = 0;
 
if (qp->state > IB_QPS_RTS)
@@ -945,8 +949,10 @@ int c2_post_receive(struct ib_qp *ibqp, 
break;
}
 
+   spin_lock_irqsave(&qp->lock, lock_flags);
err = qp_wr_post(&qp->rq_mq, &wr, qp, qp->rq_mq.msg_size);
if (err) {
+   spin_unlock_irqrestore(&qp->lock, lock_flags);
break;
}
 
@@ -954,6 +960,7 @@ int c2_post_receive(struct ib_qp *ibqp, 
 * Enqueue mq index to activity FIFO
 */
c2_activity(c2dev, qp->rq_mq.index, qp->rq_mq.hint_count);
+   spin_unlock_irqrestore(&qp->lock, lock_flags);
 
ib_wr = ib_wr->next;
}
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/2] iWARP Connection Manager.

2006-08-30 Thread Tom Tucker
On Wed, 2006-08-30 at 10:35 -0700, Roland Dreier wrote:
> OK, getting closer to finishing the merge...
> 
> anyway, why is iw_cm_private.h in include/rdma where it is visible
> everywhere?  As far as I can tell drivers/infiniband/core/iwcm.c is
> the only place it's included.  So why not just put this stuff in
> drivers/infiniband/core/iwcm.h and do

The data structures really belong in iwcm.c...but I have a KDB module
that dumps IB data structures. So when I was writing the IWCM, I pulled
them out where I could see them without include gymnastics. It seems
pretty dumb though to have header files called *private.h in a public
directory. Putting them in iwcm.h is fine with me... 

Here's a patch for the KDB code if anyone is interested...

KDB module for dumping OpenFabrics stack data types

From: Tom Tucker <[EMAIL PROTECTED]>


---

 kdb/kdbmain.c  |2 
 kdb/modules/Makefile   |3 
 kdb/modules/kdbm_openfabrics.c |  372

 3 files changed, 375 insertions(+), 2 deletions(-)

diff --git a/kdb/kdbmain.c b/kdb/kdbmain.c
index 931b643..b35139a 100644
--- a/kdb/kdbmain.c
+++ b/kdb/kdbmain.c
@@ -1154,8 +1154,8 @@ kdb_quiet(int reason)
  * none
  */
 
+void kdba_cpu_up(void) {};
 extern char kdb_prompt_str[];
-
 static int
 kdb_local(kdb_reason_t reason, int error, struct pt_regs *regs,
kdb_dbtrap_t db_result)
 {
diff --git a/kdb/modules/Makefile b/kdb/modules/Makefile
index ae2ac53..fbf05e1 100644
--- a/kdb/modules/Makefile
+++ b/kdb/modules/Makefile
@@ -6,7 +6,8 @@ #
 # Copyright (c) 1999-2006 Silicon Graphics, Inc.  All Rights Reserved.
 #
 
-obj-$(CONFIG_KDB_MODULES) += kdbm_pg.o kdbm_task.o kdbm_vm.o
kdbm_sched.o
+obj-$(CONFIG_KDB_MODULES) += kdbm_pg.o kdbm_task.o kdbm_vm.o
kdbm_sched.o \
+   kdbm_openfabrics.o
 ifdef CONFIG_X86
 ifndef CONFIG_X86_64
 obj-$(CONFIG_KDB_MODULES) += kdbm_x86.o
diff --git a/kdb/modules/kdbm_openfabrics.c
b/kdb/modules/kdbm_openfabrics.c
new file mode 100644
index 000..fdf204b
--- /dev/null
+++ b/kdb/modules/kdbm_openfabrics.c
@@ -0,0 +1,372 @@
+/*
+ * Copyright (c) 2006 Tom Tucker, Open Grid Computing, Inc. 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../drivers/infiniband/hw/amso1100/c2_provider.h"
+
+MODULE_AUTHOR("Tom Tucker");
+MODULE_DESCRIPTION("Debug RDMA");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static const char *wc_status_str[] = {
+   "SUCCESS",
+   "LOC_LEN_ERR",
+   "LOC_QP_OP_ERR",
+   "LOC_EEC_OP_ERR",
+   "LOC_PROT_ERR",
+   "WR_FLUSH_ERR",
+   "MW_BIND_ERR",
+   "BAD_RESP_ERR",
+   "LOC_ACCESS_ERR",
+   "REM_INV_REQ_ERR",
+   "REM_ACCESS_ERR",
+   "REM_OP_ERR",
+   "RETRY_EXC_ERR",
+   "RNR_RETRY_EXC_ERR",
+   "LOC_RDD_VIOL_ERR",
+   "REM_INV_RD_REQ_ERR",
+   "REM_ABORT_ERR",
+   "INV_EECN_ERR",
+   "INV_EEC_STATE_ERR",
+   "FATAL_ERR",
+   "RESP_TIMEOUT_ERR",
+   "GENERAL_ERR"
+};
+
+static inline const char *wc_status_to_str(int status)
+{
+   if (status > (sizeof(wc_status_str) / sizeof(wc_status_str[0])))
+   return "";
+
+   return wc_status_str[status];
+}
+
+static const char *wc_opcode_str[] = {
+   "SEND",
+   "RDMA_WRITE",
+   "RDMA_READ",
+   "COMP_SWAP",
+   "FETCH_ADD",
+   "BIND_MW",
+   "RECV",
+   "RECV_RDMA_WITH_IMM",
+};
+
+static inline const char* wc_opcode_to_str(int op) 
+{
+   
+   if (op > 129)
+   return "";
+   else if (op >= 128)
+   op -= 122;
+   return wc_opcode_str[op];
+}
+
+static int
+print_ib_wc(int argc, const char **argv, const char **envp,
+   struct pt_regs *regs)
+{
+   int ret = 0;
+
+   if (argc == 1) {
+   kdb_machreg_t addr;
+   int nextarg = 1;
+   long offset = 0;
+   struct ib_wc wc;
+
+   ret = kdbgetaddrarg(argc, argv, &nextarg, &addr, 
+   &offset, NULL, regs);
+   if (ret) 
+   return ret;
+
+   kdb_printf("struct ib_wc [%p]\n", (void*)addr);
+   ret = kdb_getarea_size((void*)&wc, (unsigned long)addr, 
+  sizeof wc);
+   if (ret)
+   return ret;
+
+   kdb_printf("  wr_id  : %llx\n", wc.wr_id);
+   kdb_printf("  status : \"%s\"\

Re: RDMA will be reverted

2006-07-25 Thread Tom Tucker
On Mon, 2006-07-24 at 15:23 -0700, David Miller wrote: 
> From: Tom Tucker <[EMAIL PROTECTED]>
> Date: Wed, 05 Jul 2006 12:09:42 -0500
> 
> > "A TOE net stack is closed source firmware. Linux engineers have no way
> > to fix security issues that arise. As a result, only non-TOE users will
> > receive security updates, leaving random windows of vulnerability for
> > each TOE NIC's users."
> > 
> > - A Linux security update may or may not be relevant to a vendors
> > implementation. 
> > 
> > - If a vendor's implementation has a security issue then the customer
> > must rely on the vendor to fix it. This is no less true for iWARP than
> > for any adapter.
> 
> This isn't how things actually work.
> 
> Users have a computer, and they can rightly expect the community
> to help them solve problems that occur in the upstream kernel.
> 
> When a bug is found and the person is using NIC X, we don't
> necessarily forward the bug report to the vendor of NIC X.
> Instead we try to fix the bug.  Many chip drivers are maintained
> by people who do not work for the company that makes the chip,
> and this works just fine.
> 
> If only the chip vendor can fix a security problem, this makes Linux
> less agile to fix.  Even aspect of a problem on a Linux system that
> cannot be fixed entirely by the community is a net negative for Linux.
> 

All true. What I meant to say was that this is "no less true than for
any deep adapter". It is incontrovertible that a deep adapter is less
flexible, and more difficult to support than a shallow adapter.

> > - iWARP needs to do protocol processing in order to validate and
> > evaluate TCP payload in advance of direct data placement. This
> > requirement is independent of CPU speed. 
> 
> Yet, RDMA itself is just an optimization meant to deal with
> limitations of cpu and memory speed.  You can rephrase the
> situation in whatever way suits your argument, but it does not
> make the core issue go away :)

Yep.

> 
> > - I suspect that connection rates for RDMA adapters fall well-below the
> > rates attainable with a dumb device. That said, all of the RDMA
> > applications that I know of are not connection intensive. Even for TOE,
> > the later HTTP versions makes connection rates less of an issue.
> 
> This is a very naive evaluation of the situation.  Yes, newer
> versions of protocols such as HTTP make the per-client connection
> burdon lower, but the number of clients will increase in time to
> more than makeup for whatever gains are seen due to this.

Naive is being kind, my HTTP comment is irrelevant :).  

> And then you have protocols which by design are connection heavy,
> and rightly so, such as bittorrent.
> 
> Being able to handle enormous numbers of connections, with extreme
> scalability and low latency, is an absolute requirement of any modern
> day serious TCP stack.  And this requirement is not going away.
> Wishing this requirement away due to HTTP persistent connections is
> very unrealistic, at best.
> 
> > - This is the problem we're trying to solve...incrementally and
> > responsibly.
> 
> You can't.  See my email to Roland about why even VJ net channels
> are found to be impractical.  To support netfilter properly, you
> must traverse the whole netfilter stack, because NAT can rewrite
> packets, yet still make them destined for the local system, and
> thus they will have a different identity for connection demux
> by the time the TCP stack sees the packet.
> 

I'm not claiming that all the problems can be solved, I'm suggesting
that integration could be better and that partial integration is better
than none. 

> All of these tranformations occur between normal packet receive
> and the TCP stack.  You would therefore need to put your card
> between netfilter and TCP in the packet input path, and at that
> point why bother with the stateful card at all?
> 
> The fact is that stateless approaches will always be better than
> stateful things because you cannot replicate the functionality we
> have in the Linux stack without replicating 10 years of work into
> your chip's firmware.  At that point you should just run Linux
> on your NIC since that is what you are effectively doing :)
> 

I wish...I'd have a better stack. 

> In conversations such as these, it helps us a lot if you can be frank
> and honest about the true absolute limitations of your technology.  

I'm trying ... classifying these limitations as "core can't fix" and
"fixable with integration" is where we're getting crosswise. 

> I
> can see that your viewpoint is tainted when I hear things suc

Re: RDMA will be reverted

2006-07-07 Thread Tom Tucker
On Thu, 2006-07-06 at 23:53 -0700, David Miller wrote:
> From: Tom Tucker <[EMAIL PROTECTED]>
> Date: Thu, 06 Jul 2006 00:25:03 -0500
> 
> > This patch is about dotting I's and crossing T's, it's not about
> > foundations.
> 
> You assume that I've flat out rejected RDMA, in fact I haven't.  I
> really don't have enough information to form a final opinion yet.
> There's about a week of emails on this topic which I need to read
> and digest first.

I realize that there is a tremendous amount of work out there and this
is just one thread.

> 
> What I am saying, however, is that we need to understand the
> technology and the hooks you guys want before we put any of it in.

Absolutely.

> 
> I don't think that's unreasonable.

Not at all. Let me know if I can help.

Tom



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA will be reverted

2006-07-06 Thread Tom Tucker
On Fri, 2006-07-07 at 10:03 +1000, Herbert Xu wrote:
> On Thu, Jul 06, 2006 at 12:36:24PM -0500, Tom Tucker wrote:
> > 
> > The RDMA CMA uses IP addresses and port numbers to create a uniform
> > addressing scheme across all transport types. For IB, it is necessary to
> > resolve IP addresses to IB GIDs. The ARP protocol is used to do this and
> > a netfilter rule is installed to snoop the incoming ARP replies. This
> > would not be necessary if ARP events were provided as in the patch. 
> 
> Well the concerns we have do not apply to just iWARP, but RDMA/IP in
> general so this isn't really another technology.
> 
> In fact, it seems that we now have IP-specific knowledge living in
> drivers/infiniband/core/cma.c which is suboptimal.

To be clear the CMA doesn't look in the ARP packet, it just uses the
existence of the packet as an indication that it should check to see if
the ARP request it submitted for an IP address has been resolved yet. I
agree that this is suboptimal and why I think the notifier is a nice
alternative. 

> 
> > Unified wire iSCSI adapters have the same issue as iWARP wrt to managing
> > IP addresses and ports.
> 
> If by Unified wire iSCSI you mean something that presents a SCSI interface
> together with an Ethernet interface where the two share the same MAC and
> IP address, 

Yes, this is what I mean. But the notifier doesn't actually support
this, you would need to expose the IP/port space database to solve that
problem.

What I was referring to relative to  iSCSI is if the adapter is relying
on Linux to do ARP via the above suboptimal implementation, then it
would benefit from the notifier patch.

> then we have the same concerns with it as we do with iWARP or
> TOE.

Yes indeed.

> 
> Cheers,




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA will be reverted

2006-07-06 Thread Tom Tucker
On Fri, 2006-07-07 at 00:08 +1000, Herbert Xu wrote:
> Tom Tucker <[EMAIL PROTECTED]> wrote:
> > 
> > All that said, the proposed patch helps not only iWARP, but other
> > transports (iSCSI, IB) as well. It is not large, invasive,
> 
> Care to explain on how it helps those other technologies?

The RDMA CMA uses IP addresses and port numbers to create a uniform
addressing scheme across all transport types. For IB, it is necessary to
resolve IP addresses to IB GIDs. The ARP protocol is used to do this and
a netfilter rule is installed to snoop the incoming ARP replies. This
would not be necessary if ARP events were provided as in the patch. 

Unified wire iSCSI adapters have the same issue as iWARP wrt to managing
IP addresses and ports.


> 
> Cheers,

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA will be reverted

2006-07-05 Thread Tom Tucker
On Wed, 2006-07-05 at 20:03 -0700, David Miller wrote:
> From: Roland Dreier <[EMAIL PROTECTED]>
> Date: Wed, 05 Jul 2006 13:29:35 -0700
> 
> > The way forward seems to be to merge basic iWARP support that lives in
> > drivers/infiniband, and then you can accept or reject things for
> > better integration, like notifiers for routing changes.
> 
> 
> Let's merge in drivers before the necessary infrastructure.
> 
> 
> No, I think that's not the way forward.  You build the foundation
> before the house, if the foundation cannot be built then you are
> wasting your time with the house idea.

We have been running NFS and other apps over iWARP 24x7 for the last 6
mos...without the proposed netdev patch. We've run 200+ node MPI
clusters for days and days over iWARP...without the proposed netdev
patch. We ran iWARP interoperability tests across the country between
Boston and San Jose...without ... yes I know ... you get it.


News flash...the foundation is built!


But! Our stable LAN and the WAN tests didn't often experience MTU
changes, and redirects...but of course we knew these were inevitable. So
our goal was to make iWARP more robust in the face of a more dynamic
network topology. Shutters on the house maybe...I dunno, it's your
analogy ;-)

All that said, the proposed patch helps not only iWARP, but other
transports (iSCSI, IB) as well. It is not large, invasive,
intrusive...hell it's not even new. It leverages an existing event
notifier mechanism. 

This patch is about dotting I's and crossing T's, it's not about
foundations.


> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA will be reverted

2006-07-05 Thread Tom Tucker
On Sat, 2006-07-01 at 16:26 +0200, Andi Kleen wrote:
> On Saturday 01 July 2006 01:01, Tom Tucker wrote:
> > On Fri, 2006-06-30 at 14:16 -0700, David Miller wrote:
> > 
> > > The TOE folks have tried to submit their hooks and drivers
> > > on several occaisions, and we've rejected it every time.
> > 
> > iWARP != TOE
> 
> Perhaps a good start of that discussion David asked for would 
> be if you could give us an overview of the differences
> and how you avoid the TOE problems.

I think Roland already gave the high-level overview. For those
interested in some of the details, the API for iWARP transports was
originally conceived independently from IB and is documented in the
RDMAC Verbs Specification found here:

http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf

The protocols, etc... are available here:
http://www.ietf.org/html.charters/rddp-charter.html

As Roland mentioned, the RDMAC verbs are *very* similar to the IB verbs
and so when we were thinking about how to design an API for iWARP we
concluded it would be best to leverage the tremendous amount of work
already done for IB by OpenFabrics and then work iteratively to extend
this API to include features unique to iWARP. This work has been ongoing
since September of 2005. 

There is an open source svn repository available for the iWARP source at
https://openib.org/svn/gen2/branches/iwarp.

There is also an open source NFS over RDMA implementation for Linux
available here that: http://sourceforge.net/projects/nfs-rdma.


So how do we avoid the TOE pitfalls with iWARP? I think it depends on
the pitfall. At the low level:

- Stale Network/Address Information: Path MTU Change, ICMP Redirect 
and ARP next hop changes need netlink notifier events so that hardware
can be updated when they change. I see this support as an extension (new
events) to an existing service and a relatively low-level of "parallel
stack integration". iSCSI and IB could also benefit from these events.

- Port Space Collision, i.e. socket app and rdma/iWARP apps collide on 
a port number: The RDMA CMA needs to be able to allocate and de-allocate
port numbers, however, the services that do this today are not exported
and would need some minor tweaking. iSCSI and IB benefit from these
services as well.

- netfilter rules, syn-flood, conn-rate, etc You pointed out that 
if connection establishment were done in the native stack (SYN,
SYN/ACK), that this would account for the bulk of the netfilter utility,
however, this probably results in falling into many of the TOE traps
people have issue with.

WRT to http://linux-net.osdl.org/index.php/TOE

Security Updates

"A TOE net stack is closed source firmware. Linux engineers have no way
to fix security issues that arise. As a result, only non-TOE users will
receive security updates, leaving random windows of vulnerability for
each TOE NIC's users."

- A Linux security update may or may not be relevant to a vendors
implementation. 

- If a vendor's implementation has a security issue then the customer
must rely on the vendor to fix it. This is no less true for iWARP than
for any adapter.

Point-in-time Solution

"Each TOE NIC has a limited lifetime of usefulness, because system
hardware rapidly catches up to TOE performance levels, and eventually
exceeds TOE performance levels. We saw this with 10mbit TOE, 100mbit
TOE, gigabit TOE, and soon with 10gig TOE."

- iWARP needs to do protocol processing in order to validate and
evaluate TCP payload in advance of direct data placement. This
requirement is independent of CPU speed. 

Different Network Behavior

"System administrators are quite familiar with how the Linux network
stack interoperates with the world at large. TOE is a black box, each
NIC requires re-examination of network behavior. Network scanners and
analysis tools must be updated, or they will provide faulty analysis."

- Native Linux Tools like tcpdump, netstat, etc... will not work as
expected. 

- Network Analyzers such as Finisar, etc... will work just fine.

Performance

"Experience has shown that TOE implementations require additional work
(programming the hardware, hardware-specific socket manipulation) to set
up and tear down connections. For connection intensive protocols such as
HTTP, TOE often underperforms."

- I suspect that connection rates for RDMA adapters fall well-below the
rates attainable with a dumb device. That said, all of the RDMA
applications that I know of are not connection intensive. Even for TOE,
the later HTTP versions makes connection rates less of an issue.

Hardware-specific limits 

"TOE NICs are more resource limited than your overall computer system.
This is most readily apparent under load, when trying to support
thousands of simultaneous connections. TOE NICs simply do not have the
memory resources to buffer thousands of connections, much les

Re: RDMA will be reverted

2006-06-30 Thread Tom Tucker
On Fri, 2006-06-30 at 14:16 -0700, David Miller wrote:

> The TOE folks have tried to submit their hooks and drivers
> on several occaisions, and we've rejected it every time.

iWARP != TOE

> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA will be reverted

2006-06-29 Thread Tom Tucker
On Thu, 2006-06-29 at 13:53 -0700, David Miller wrote:
> From: Tom Tucker <[EMAIL PROTECTED]>
> Date: Thu, 29 Jun 2006 15:47:13 -0500
> 
> > I concede that features have been lost, but some applications benefit
> > greatly from RDMA. For these applications and these customers,
> 
> TOE folks give the same story... it's a broken record, really.
> 
> Let us know when you can say something new about the situation.
> 
> Under Linux we get to make better long term architectually sane
> decisions, even if it is to the dismay of the backers of certain
> short-sighted pieces of technology.

Would you indulge me with one final clarification?  

- Are you condemning RDMA over TCP as an ill-conceived technology?

- Are you condemning the implementation of iWARP? 

- Are you condemning both?

Thanks,
Tom



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA will be reverted

2006-06-29 Thread Tom Tucker
On Thu, 2006-06-29 at 13:19 -0700, David Miller wrote:
> From: Tom Tucker <[EMAIL PROTECTED]>
> Date: Thu, 29 Jun 2006 15:11:06 -0500
> 
> > Doesn't this position force vendors to build deeper adapters, not
> > shallower adapters? Isn't this exactly the opposite of what is intended?
> 
> Nope.
> 
> Look at what the networking developers give a lot of attention and
> effort to, things like TCP Large Receive Offload, and Van Jacobson net
> channels, both of which are fully stack integrated receive performance
> enhancements.  They do not bypass netfilter, they do not bypass
> packet scheduling, and yet they provide a hardware assist performance
> improvement for receive.

These technologies are integrated because someone chose to and was
allowed to integrate them. I contend that iWARP could be equally well
integrated if the decision was made to do so. It would, however, require
cooperation from both the hardware vendors and the netdev maintainers.

> 
> This has been stated over and over again.

For TOE, you are correct, however, for iWARP, you can't do RDMA (direct
placement into application buffers) without state in the adapter. I
personally tried very hard to build an adapter without doing so, but
alas, I failed ;-) 

> 
> If companies keep designing undesirable hardware that unnecessarily
> takes features away from the user, that really is not our problem.

I concede that features have been lost, but some applications benefit
greatly from RDMA. For these applications and these customers, the
hardware is not undesirable and the fact that netfilter won't work on
their sub 5us latency adapter is not perceived to be a big issue. The
mention of packet scheduling would cause an apoplectic seizure...unless
it were in the hardware...

All that verbiage aside, I believe that it is not a matter of whether it
is possible to integrate iWARP it is question of whether it is
permissible.




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA will be reverted

2006-06-29 Thread Tom Tucker
[...snip...]
> community at large, Linux and Andrew, you? I'm not trying to be

Linus sorry... spell checker...



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA will be reverted

2006-06-29 Thread Tom Tucker
On Thu, 2006-06-29 at 12:46 -0700, David Miller wrote:
> From: Roland Dreier <[EMAIL PROTECTED]>
> Date: Thu, 29 Jun 2006 09:54:37 -0700
> 
> > In any case I think we need to find a way for Linux to support iWARP
> > hardware, since there are users that want this, and (some of) the
> > vendors are working hard to do things the right way (including cc'ing
> > netdev on the conversation).  I don't think it's good for Linux for
> > the answer to just be, "sorry, you're wrong to want to use that hardware."
> 
> We give the same response for TOE stuff.

What does the word "we" represent in this context? Is it the Linux
community at large, Linux and Andrew, you? I'm not trying to be
argumentative, I just want to understand how carefully and by whom iWARP
technology has been considered.

> 
> The integration of iWARP with the Linux networking, while much better
> than TOE, is still heavily flawed.
> 
> What most people might not realize when using this stuff is that:

Agreed, the patch improves some things, but doesn't address others. But
isn't this position a condemnation of the good to spite the bad? 

> 
> 1) None of their firewall rules will apply to the iWARP communications.
> 2) None of their packet scheduling configurations can be applied to
>the iWARP communications.
> 3) It is not possible to encapsulate iWARP traffic in IPSEC
> 
> And the list goes on and on.

It does, however, this position statement makes things worse, not
better. By this I mean that deep adapters (iSCSI, iWARP) are even more
debilitated by not being able to snoop MTU changes, etc... and are
therefore forced to duplicate sub-systems (e.g. ARP, ICMP, ...) already
ably implemented in host software.

> This is what we don't like about technologies that implement their own
> networking stack in the card firmware.

Doesn't this position force vendors to build deeper adapters, not
shallower adapters? Isn't this exactly the opposite of what is intended?

> 

> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMA will be reverted

2006-06-28 Thread Tom Tucker
On Wed, 2006-06-28 at 00:07 -0700, David Miller wrote:
> Roland, there is no way in the world we would have let support for
> RDMA into the kernel tree had we seen and reviewed it on netdev.  I've
> discussed this with Andrew Morton, and we'd like you to please revert
> all of the RDMA code from Linus's tree immedialtely.
> 
> Folks are well aware how against RDMA and TOE type schemes the Linux
> networking developers are.  So the fact that none of these RDMA
> changes went up for review on netdev strikes me as just a little bit
> more than suspicious.
> 
> Please do not do this again, thank you.

I believe Roland is on vacation (they just had a baby..). 

It is my belief that everything that Roland submitted went through both
netdev and lklm reviews. 


> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] AMSO1100 Memory Management.

2006-06-12 Thread Tom Tucker
On Thu, 2006-06-08 at 01:17 -0700, Andrew Morton wrote:
> On Wed, 07 Jun 2006 15:06:55 -0500
> Steve Wise <[EMAIL PROTECTED]> wrote:
> 
> > 
> > +void c2_free(struct c2_alloc *alloc, u32 obj)
> > +{
> > +   spin_lock(&alloc->lock);
> > +   clear_bit(obj, alloc->table);
> > +   spin_unlock(&alloc->lock);
> > +}
> 
> The spinlock is unneeded here.

Good point.

> 
> 
> What does all the code in this file do, anyway?  It looks totally generic
> (and hence inappropriate for drivers/infiniband/hw/amso1100/) and somewhat
> similar to idr trees, perhaps.
> 

We mimicked the mthca driver. It may be code that should be replaced
with Linux core services for new drivers. We'll investigate.

> > +int c2_array_set(struct c2_array *array, int index, void *value)
> > +{
> > +   int p = (index * sizeof(void *)) >> PAGE_SHIFT;
> > +
> > +   /* Allocate with GFP_ATOMIC because we'll be called with locks held. */
> > +   if (!array->page_list[p].page)
> > +   array->page_list[p].page =
> > +   (void **) get_zeroed_page(GFP_ATOMIC);
> > +
> > +   if (!array->page_list[p].page)
> > +   return -ENOMEM;
> 
> This _will_ happen under load.  What will the result of that be, in the
> context of thise driver?

A higher level object allocation will fail. In this case, a kernel
application request will fail and the application must handle the error.
> 
> This function is incorrectly designed - it should receive a gfp_t argument.
> Because you don't *know* that the caller will always hold a spinlock.  And
> GFP_KERNEL is far, far stronger than GFP_ATOMIC.

This service is allocating a page that the adapter will DMA 2B message
indices into. 
> 
> > +static int c2_alloc_mqsp_chunk(gfp_t gfp_mask, struct sp_chunk **head)
> > +{
> > +   int i;
> > +   struct sp_chunk *new_head;
> > +
> > +   new_head = (struct sp_chunk *) __get_free_page(gfp_mask | GFP_DMA);
> 
> Why is __GFP_DMA in there?  Unless you've cornered the ISA bus infiniband
> market, it's likely to be wrong.
> 

Flag confusion about what GFP_DMA means. We'll revisit this whole
file ... 

> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] iWARP Connection Manager.

2006-06-12 Thread Tom Tucker
Andrew, thanks for the review, comments inline...

On Thu, 2006-06-08 at 00:54 -0700, Andrew Morton wrote:
> On Wed, 07 Jun 2006 15:06:05 -0500
> Steve Wise <[EMAIL PROTECTED]> wrote:
> 
> > 
> > This patch provides the new files implementing the iWARP Connection
> > Manager.
> > 
> > Review Changes:
> > 
> > - sizeof -> sizeof()
> > 
> > - removed printks
> > 
> > - removed TT debug code
> > 
> > - cleaned up lock/unlock around switch statements.
> > 
> > - waitqueue -> completion for destroy path.
> >
> > ...
> >
> > +/* 
> > + * This function is called on interrupt context. Schedule events on
> > + * the iwcm_wq thread to allow callback functions to downcall into
> > + * the CM and/or block.  Events are queued to a per-CM_ID
> > + * work_list. If this is the first event on the work_list, the work
> > + * element is also queued on the iwcm_wq thread.
> > + *
> > + * Each event holds a reference on the cm_id. Until the last posted
> > + * event has been delivered and processed, the cm_id cannot be
> > + * deleted. 
> > + */
> > +static void cm_event_handler(struct iw_cm_id *cm_id,
> > +struct iw_cm_event *iw_event) 
> > +{
> > +   struct iwcm_work *work;
> > +   struct iwcm_id_private *cm_id_priv;
> > +   unsigned long flags;
> > +
> > +   work = kmalloc(sizeof(*work), GFP_ATOMIC);
> > +   if (!work)
> > +   return;
> 
> This allocation _will_ fail sometimes.  The driver must recover from it. 
> Will it do so?

Er...no. It will lose this event. Depending on the event...the carnage
varies. We'll take a look at this.

> 
> > +EXPORT_SYMBOL(iw_cm_init_qp_attr);
> 
> This file exports a ton of symbols.  It's usual to provide some justifying
> commentary in the changelog when this happens.

This module is a logical instance of the xx_cm where xx is the transport
type. I think there is some discussion warranted on whether or not these
should all be built into and exported by rdma_cm. One rationale would be
that the rdma_cm is the only client for many of these functions (this
being a particularly good example) and doing so would reduce the export
count. Others would be reasonably needed for any application (connect,
etc...)

All that said, we'll be sure to document the exported symbols in a
follow-up patch.

> 
> > +/*
> > + * Copyright (c) 2005 Network Appliance, Inc. All rights reserved.
> > + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
> > + *
> > + * This software is available to you under a choice of one of two
> > + * licenses.  You may choose to be licensed under the terms of the GNU
> > + * General Public License (GPL) Version 2, available from the file
> > + * COPYING in the main directory of this source tree, or the
> > + * OpenIB.org BSD license below:
> > + *
> > + * Redistribution and use in source and binary forms, with or
> > + * without modification, are permitted provided that the following
> > + * conditions are met:
> > + *
> > + *  - Redistributions of source code must retain the above
> > + *copyright notice, this list of conditions and the following
> > + *disclaimer.
> > + *
> > + *  - Redistributions in binary form must reproduce the above
> > + *copyright notice, this list of conditions and the following
> > + *disclaimer in the documentation and/or other materials
> > + *provided with the distribution.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> > + * SOFTWARE.
> > + */
> > +#if !defined(IW_CM_PRIVATE_H)
> > +#define IW_CM_PRIVATE_H
> 
> We normally use #ifndef here.

We'll change this..

> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] iWARP Core Changes.

2006-06-07 Thread Tom Tucker

A reference is being taken on an iWARP device that is never getting
released. This prevents a participating iWARP netdev device from being
unloaded after a connection has been established on the passive side.

Search for ip_dev_find below...

On Wed, 2006-06-07 at 15:06 -0500, Steve Wise wrote:
> This patch contains modifications to the existing rdma header files,
> core files, drivers, and ulp files to support iWARP.
> 
> Review updates:
> 
> - copy_addr() -> rdma_copy_addr()
> 
> - dst_dev_addr param in rdma_copy_addr to const.
> 
> - various spacing nits with recasting
> 
> - include linux/inetdevice.h to get ip_dev_find() prototype.
> ---
> 
>  drivers/infiniband/core/Makefile |4 
>  drivers/infiniband/core/addr.c   |   19 +
>  drivers/infiniband/core/cache.c  |8 -
>  drivers/infiniband/core/cm.c |3 
>  drivers/infiniband/core/cma.c|  353 
> +++---
>  drivers/infiniband/core/device.c |6 
>  drivers/infiniband/core/mad.c|   11 +
>  drivers/infiniband/core/sa_query.c   |5 
>  drivers/infiniband/core/smi.c|   18 +
>  drivers/infiniband/core/sysfs.c  |   18 +
>  drivers/infiniband/core/ucm.c|5 
>  drivers/infiniband/core/user_mad.c   |9 -
>  drivers/infiniband/hw/ipath/ipath_verbs.c|2 
>  drivers/infiniband/hw/mthca/mthca_provider.c |2 
>  drivers/infiniband/ulp/ipoib/ipoib_main.c|8 +
>  drivers/infiniband/ulp/srp/ib_srp.c  |2 
>  include/rdma/ib_addr.h   |   15 +
>  include/rdma/ib_verbs.h  |   39 +++
>  18 files changed, 435 insertions(+), 92 deletions(-)
> 
> diff --git a/drivers/infiniband/core/Makefile 
> b/drivers/infiniband/core/Makefile
> index 68e73ec..163d991 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -1,7 +1,7 @@
>  infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS)   := ib_addr.o rdma_cm.o
>  
>  obj-$(CONFIG_INFINIBAND) +=  ib_core.o ib_mad.o ib_sa.o \
> - ib_cm.o $(infiniband-y)
> + ib_cm.o iw_cm.o $(infiniband-y)
>  obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o
>  obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=  ib_uverbs.o ib_ucm.o
>  
> @@ -14,6 +14,8 @@ ib_sa-y :=  sa_query.o
>  
>  ib_cm-y :=   cm.o
>  
> +iw_cm-y :=   iwcm.o
> +
>  rdma_cm-y := cma.o
>  
>  ib_addr-y := addr.o
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index d294bbc..83f84ef 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -32,6 +32,7 @@ #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -60,12 +61,15 @@ static LIST_HEAD(req_list);
>  static DECLARE_WORK(work, process_req, NULL);
>  static struct workqueue_struct *addr_wq;
>  
> -static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
> -  unsigned char *dst_dev_addr)
> +int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
> +  const unsigned char *dst_dev_addr)
>  {
>   switch (dev->type) {
>   case ARPHRD_INFINIBAND:
> - dev_addr->dev_type = IB_NODE_CA;
> + dev_addr->dev_type = RDMA_NODE_IB_CA;
> + break;
> + case ARPHRD_ETHER:
> + dev_addr->dev_type = RDMA_NODE_RNIC;
>   break;
>   default:
>   return -EADDRNOTAVAIL;
> @@ -77,6 +81,7 @@ static int copy_addr(struct rdma_dev_add
>   memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN);
>   return 0;
>  }
> +EXPORT_SYMBOL(rdma_copy_addr);
>  
>  int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr)
>  {
> @@ -88,7 +93,7 @@ int rdma_translate_ip(struct sockaddr *a
>   if (!dev)
>   return -EADDRNOTAVAIL;
>  
> - ret = copy_addr(dev_addr, dev, NULL);
> + ret = rdma_copy_addr(dev_addr, dev, NULL);
>   dev_put(dev);
>   return ret;
>  }
> @@ -160,7 +165,7 @@ static int addr_resolve_remote(struct so
>  
>   /* If the device does ARP internally, return 'done' */
>   if (rt->idev->dev->flags & IFF_NOARP) {
> - copy_addr(addr, rt->idev->dev, NULL);
> + rdma_copy_addr(addr, rt->idev->dev, NULL);
>   goto put;
>   }
>  
> @@ -180,7 +185,7 @@ static int addr_resolve_remote(struct so
>   src_in->sin_addr.s_addr = rt->rt_src;
>   }
>  
> - ret = copy_addr(addr, neigh->dev, neigh->ha);
> + ret = rdma_copy_addr(addr, neigh->dev, neigh->ha);
>  release:
>   neigh_release(neigh);
>  put:
> @@ -244,7 +249,7 @@ static int addr_resolve_local(struct soc
>   if (ZERONET(src_ip)) {
>   src_in->sin_family = dst_in->sin

Re: [openib-general] Re: [PATCH 1/2] iWARP Connection Manager.

2006-06-01 Thread Tom Tucker
On Thu, 2006-06-01 at 14:09 -0700, Sean Hefty wrote:
> Steve Wise wrote:
> >>>+int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt)
> >>>+{
> >>>+  struct iwcm_id_private *cm_id_priv;
> >>>+  unsigned long flags;
> >>>+  int ret = 0;
> >>>+
> >>>+  cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> >>>+  /* Wait if we're currently in a connect or accept downcall */
> >>>+  wait_event(cm_id_priv->connect_wait, 
> >>>+ !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags));
> >>
> >>Am I understanding this check correctly?  You're checking to see if the 
> >>user has 
> >>called iw_cm_disconnect() at the same time that they called iw_cm_connect() 
> >>or 
> >>iw_cm_accept().  Are connect / accept blocking, or are you just waiting for 
> >>an 
> >>event?
> > 
> > 
> > The CM must wait for the low level provider to finish a connect() or
> > accept() operation before telling the low level provider to disconnect
> > via modifying the iwarp QP.  Regardless of whether they block, this
> > disconnect can happen concurrently with the connect/accept so we need to
> > hold the disconnect until the connect/accept completes.
> > 
> > 
> >>>+EXPORT_SYMBOL(iw_cm_disconnect);
> >>>+static void destroy_cm_id(struct iw_cm_id *cm_id)
> >>>+{
> >>>+  struct iwcm_id_private *cm_id_priv;
> >>>+  unsigned long flags;
> >>>+  int ret;
> >>>+
> >>>+  cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> >>>+  /* Wait if we're currently in a connect or accept downcall. A
> >>>+   * listening endpoint should never block here. */
> >>>+  wait_event(cm_id_priv->connect_wait, 
> >>>+ !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags));
> >>
> >>Same question/comment as above.
> >>
> > 
> > 
> > Same answer.  
> 
> There's a difference between trying to handle the user calling 
> disconnect/destroy at the same time a call to accept/connect is active, 
> versus 
> the user calling disconnect/destroy after accept/connect have returned.  In 
> the 
> latter case, I think you're fine.  In the first case, this is allowing a user 
> to 
> call destroy at the same time that they're calling accept/connect. 
> Additionally, there's no guarantee that the F_CONNECT_WAIT flag has been set 
> by 
> accept/connect by the time disconnect/destroy tests it.

The problem is that we can't synchronously cancel an outstanding connect
request. Once we've asked the adapter to connect, we can't tell him to
stop, we have to wait for it to fail. During the time period between
when we ask to connect and the adapter says yeah-or-nay, the user hits
ctrl-C. This is the case where disconnect and/or destroy gets called and
we have to block it waiting for the outstanding connect request to
complete.

One alternative to this approach is to do the kfree of the cm_id in the
deref logic. This was the original design and leaves the object around
to handle the completion of the connect and still allows the app to
clean up and go away without all this waitin' around. When the adapter
finally finishes and releases it's reference, the object is kfree'd.

Hope this helps.
 
> 
> - Sean
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [openib-general] Re: [PATCH 1/2] iWARP Connection Manager.

2006-06-01 Thread Tom Tucker
On Wed, 2006-05-31 at 15:22 -0700, Sean Hefty wrote:
> Steve Wise wrote:
> > +/* 
> > + * Release a reference on cm_id. If the last reference is being removed
> > + * and iw_destroy_cm_id is waiting, wake up the waiting thread.
> > + */
> > +static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
> > +{
> > +   int ret = 0;
> > +
> > +   BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> > +   if (atomic_dec_and_test(&cm_id_priv->refcount)) {
> > +   BUG_ON(!list_empty(&cm_id_priv->work_list));
> > +   if (waitqueue_active(&cm_id_priv->destroy_wait)) {
> > +   BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
> > +   BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
> > +   &cm_id_priv->flags));
> > +   ret = 1;
> > +   wake_up(&cm_id_priv->destroy_wait);
> 
> We recently changed the RDMA CM, IB CM, and a couple of other modules from 
> using 
> wait objects to completions.   This avoids a race condition between 
> decrementing 
> the reference count, which allows destruction to proceed, and calling wake_up 
> on 
> a freed cm_id.  My guess is that you may need to do the same.
> 
> Can you also explain the use of the return value here?  It's ignored below in 
> rem_ref() and destroy_cm_id().
> 
> > +static void add_ref(struct iw_cm_id *cm_id)
> > +{
> > +   struct iwcm_id_private *cm_id_priv;
> > +   cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> > +   atomic_inc(&cm_id_priv->refcount);
> > +}
> > +
> > +static void rem_ref(struct iw_cm_id *cm_id)
> > +{
> > +   struct iwcm_id_private *cm_id_priv;
> > +   cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> > +   iwcm_deref_id(cm_id_priv);
> > +}
> > +
> 
> > +/* 
> > + * CM_ID <-- CLOSING
> > + *
> > + * Block if a passive or active connection is currenlty being processed. 
> > Then
> > + * process the event as follows:
> > + * - If we are ESTABLISHED, move to CLOSING and modify the QP state
> > + *   based on the abrupt flag 
> > + * - If the connection is already in the CLOSING or IDLE state, the peer is
> > + *   disconnecting concurrently with us and we've already seen the 
> > + *   DISCONNECT event -- ignore the request and return 0
> > + * - Disconnect on a listening endpoint returns -EINVAL
> > + */
> > +int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt)
> > +{
> > +   struct iwcm_id_private *cm_id_priv;
> > +   unsigned long flags;
> > +   int ret = 0;
> > +
> > +   cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> > +   /* Wait if we're currently in a connect or accept downcall */
> > +   wait_event(cm_id_priv->connect_wait, 
> > +  !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags));
> 
> Am I understanding this check correctly?  You're checking to see if the user 
> has 
> called iw_cm_disconnect() at the same time that they called iw_cm_connect() 
> or 
> iw_cm_accept().  Are connect / accept blocking, or are you just waiting for 
> an 
> event?

Yes. The application (or the case I saw was user-mode exit logic after
ctrl-C) cleaning up at random times relative to connection
establishment. 

> 
> > +
> > +   spin_lock_irqsave(&cm_id_priv->lock, flags);
> > +   switch (cm_id_priv->state) {
> > +   case IW_CM_STATE_ESTABLISHED:
> > +   cm_id_priv->state = IW_CM_STATE_CLOSING;
> > +   spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> > +   if (cm_id_priv->qp) { /* QP could be  for user-mode 
> > client */
> > +   if (abrupt)
> > +   ret = iwcm_modify_qp_err(cm_id_priv->qp);
> > +   else
> > +   ret = iwcm_modify_qp_sqd(cm_id_priv->qp);
> > +   /* 
> > +* If both sides are disconnecting the QP could
> > +* already be in ERR or SQD states
> > +*/
> > +   ret = 0;
> > +   }
> > +   else
> > +   ret = -EINVAL;
> > +   break;
> > +   case IW_CM_STATE_LISTEN:
> > +   spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> > +   ret = -EINVAL;
> > +   break;
> > +   case IW_CM_STATE_CLOSING:
> > +   /* remote peer closed first */
> > +   case IW_CM_STATE_IDLE:  
> > +   /* accept or connect returned !0 */
> > +   spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> > +   break;
> > +   case IW_CM_STATE_CONN_RECV:
> > +   /* 
> > +* App called disconnect before/without calling accept after
> > +* connect_request event delivered.
> > +*/
> > +   spin_unlock_irqrestore(&cm_id_priv->lock, flags);
> > +   break;
> > +   case IW_CM_STATE_CONN_SENT:
> > +   /* Can only get here if wait above fails */
> > +   default:
> > +   BUG_ON(1);
> > +   }
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL(iw_cm_disconnect);
>

[PATCH][RFC] Notifier mechanism to notify RDMA devices of network events

2006-03-08 Thread Tom Tucker

This patch implements a mechanism that allows interested clients to 
register for notification of certain network events. The intended use
is to allow RDMA(OpenIB) devices to be notified of neighbour updates, 
ICMP redirects, path MTU changes, and route changes. 

The reason these devices need update events is because they typically 
cache this information in hardware and need to be notified when
this information has been updated.

This approach is one of many possibilities and may be preferred because
it uses an existing notification mechanism that has precedent in the 
stack. 

This code does not yet implement path MTU change because the number of 
places in which this value is updated is large and if this mechanism seems
reasonable, it would be probably be best to funnel these updates through a 
single function.


diff -u -r -x '.*' --new-file linux-2.6.14.5/include/net/netevent.h 
linux-2.6.14.5.tom/include/net/netevent.h
--- linux-2.6.14.5/include/net/netevent.h   1969-12-31 18:00:00.0 
-0600
+++ linux-2.6.14.5.tom/include/net/netevent.h   2006-01-16 13:42:09.0 
-0600
@@ -0,0 +1,41 @@
+#ifndef _NET_EVENT_H
+#define _NET_EVENT_H
+
+/*
+ * Generic netevent notifiers
+ *
+ * Authors:
+ *  Tom Tucker  <[EMAIL PROTECTED]>
+ *
+ * Changes:
+ */
+
+#ifdef __KERNEL__
+
+#include 
+
+struct netevent_redirect {
+   struct dst_entry *old;
+   struct dst_entry *new;
+};
+
+struct netevent_route_change {
+int event;
+struct fib_info *fib_info;
+};
+
+enum netevent_notif_type {
+   NETEVENT_NEIGH_UPDATE = 1, /* arg is * struct neighbour */
+   NETEVENT_ROUTE_UPDATE, /* arg is * netevent_route_change */
+   NETEVENT_PMTU_UPDATE,
+   NETEVENT_REDIRECT, /* arg is * struct netevent_redirect */
+};
+
+extern int register_netevent_notifier(struct notifier_block *nb);
+extern int unregister_netevent_notifier(struct notifier_block *nb);
+extern int call_netevent_notifiers(unsigned long val, void *v);
+
+#endif
+#endif
+
+
diff -u -r -x '.*' --new-file linux-2.6.14.5/net/core/Makefile 
linux-2.6.14.5.tom/net/core/Makefile
--- linux-2.6.14.5/net/core/Makefile2005-12-26 18:26:33.0 -0600
+++ linux-2.6.14.5.tom/net/core/Makefile2006-01-16 13:41:42.0 
-0600
@@ -7,7 +7,7 @@
 
 obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 
-obj-y   += dev.o ethtool.o dev_mcast.o dst.o \
+obj-y   += dev.o ethtool.o dev_mcast.o dst.o netevent.o \
neighbour.o rtnetlink.o utils.o link_watch.o filter.o
 
 obj-$(CONFIG_XFRM) += flow.o
diff -u -r -x '.*' --new-file linux-2.6.14.5/net/core/neighbour.c 
linux-2.6.14.5.tom/net/core/neighbour.c
--- linux-2.6.14.5/net/core/neighbour.c 2005-12-26 18:26:33.0 -0600
+++ linux-2.6.14.5.tom/net/core/neighbour.c 2006-01-16 13:41:42.0 
-0600
@@ -30,9 +30,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #define NEIGH_DEBUG 1
 
@@ -756,6 +758,7 @@
NEIGH_PRINTK2("neigh %p is suspected.\n", neigh);
neigh->nud_state = NUD_STALE;
neigh_suspect(neigh);
+   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
}
} else if (state & NUD_DELAY) {
if (time_before_eq(now, 
@@ -763,6 +766,7 @@
NEIGH_PRINTK2("neigh %p is now reachable.\n", neigh);
neigh->nud_state = NUD_REACHABLE;
neigh_connect(neigh);
+   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
next = neigh->confirmed + neigh->parms->reachable_time;
} else {
NEIGH_PRINTK2("neigh %p is probed.\n", neigh);
@@ -781,6 +785,7 @@
 
neigh->nud_state = NUD_FAILED;
notify = 1;
+   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
NEIGH_CACHE_STAT_INC(neigh->tbl, res_failed);
NEIGH_PRINTK2("neigh %p is failed.\n", neigh);
 
@@ -1051,6 +1056,9 @@
(neigh->flags | NTF_ROUTER) :
(neigh->flags & ~NTF_ROUTER);
}
+
+   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
+
write_unlock_bh(&neigh->lock);
 #ifdef CONFIG_ARPD
if (notify && neigh->parms->app_probes)
diff -u -r -x '.*' --new-file linux-2.6.14.5/net/core/netevent.c 
linux-2.6.14.5.tom/net/core/netevent.c
--- linux-2.6.14.5/net/core/netevent.c  1969-12-31 18:00:00.0 -0600
+++ linux-2.6.14.5.tom/net/core/netevent.c  2006-01-16 13:50:25.0 
-0600
@@ -0,0 +1,69 @@
+/*
+ * Network event notifiers
+ *
+ * Authors:
+ *  Tom Tucker <[EMAIL PROTECTED]>
+ *
+