[openib-general] [PATCH] mthca: fix WQE size calculation in create-srq

2005-12-17 Thread Jack Morgenstein
Thinko: 64 bytes is the minimum SRQ WQE size (not the maximum).

Signed-off-by: Jack Morgenstein <[EMAIL PROTECTED]> 

Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_srq.c
===
--- linux-kernel.orig/drivers/infiniband/hw/mthca/mthca_srq.c
+++ linux-kernel/drivers/infiniband/hw/mthca/mthca_srq.c
@@ -198,7 +198,7 @@ int mthca_alloc_srq(struct mthca_dev *de
if (mthca_is_memfree(dev))
srq->max = roundup_pow_of_two(srq->max + 1);
 
-   ds = min(64UL,
+   ds = max(64UL,
 roundup_pow_of_two(sizeof (struct mthca_next_seg) +
srq->max_gs * sizeof (struct 
mthca_data_seg)));
srq->wqe_shift = long_log2(ds);
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Andi Kleen
On Sat, Dec 17, 2005 at 09:36:29PM -0800, Robert Walsh wrote:
> On Sun, 2005-12-18 at 04:27 +0100, Andi Kleen wrote:
> > Robert Walsh <[EMAIL PROTECTED]> writes:
> > > 
> > > Any chance we could get these moved into the x86_64 arch directory,
> > > then?  We have to do double-word copies, or our chip gets unhappy.
> > 
> > Standard memcpy will do double word copies if everything is suitably
> > aligned. Just use that.
> 
> This is dealing with buffers that may be passed in from user space, so
> there's no guarantee of alignment for either the start address or the
> length.

So how can you do double word access when the length is not a multiple of four? 

The current x86-64 copy_from_user will use double work access even in that case,
except for the end of course.

But what you're doing is so deeply unportable it's not funny. I am not
sure such a unportable driver even belongs in the kernel.

If the code was really intended to run on user space addresses it 
was totally broken btw because it didn't handle exceptions.

-Andi

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Robert Walsh
On Sun, 2005-12-18 at 04:27 +0100, Andi Kleen wrote:
> Robert Walsh <[EMAIL PROTECTED]> writes:
> > 
> > Any chance we could get these moved into the x86_64 arch directory,
> > then?  We have to do double-word copies, or our chip gets unhappy.
> 
> Standard memcpy will do double word copies if everything is suitably
> aligned. Just use that.

This is dealing with buffers that may be passed in from user space, so
there's no guarantee of alignment for either the start address or the
length.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Robert Walsh
> > Any chance we could get these moved into the x86_64 arch directory,
> > then?
> 
> That would make sense.  Give it a non-ipath-related name and require that
> all architectures which wish to run this driver must implement that
> (documented) function.
> 
> And, in Kconfig, make sure that architectures which don't implement that
> library function do not attempt to build this driver.  To avoid breaking
> `make allmodconfig'.

Sounds good.  I'll get something together next week.

> >  We have to do double-word copies, or our chip gets unhappy.
> 
> In what form is this chip available?  As a standard PCI/PCIX card which
> people will want to plug into power4/ia64/x86 machines?  Or is it in some
> way exclusively tied to x86_64?

It's a HyperTransport card, not PCI/PCIe/PCIX.  It plugs into the HTX
slot on a suitably-equipped motherboard.  On some machines, it's
available on the motherboard itself (e.g. the Linux Networx LS/X.)

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Adrian Bunk
On Sat, Dec 17, 2005 at 07:19:32PM -0800, Andrew Morton wrote:
>...
> In what form is this chip available?  As a standard PCI/PCIX card which
> people will want to plug into power4/ia64/x86 machines?  Or is it in some
> way exclusively tied to x86_64?

Hardware can hardly be exclusively tied to x86_64 without also being 
available on x86 machines since i386 kernels run on x86_64 hardware.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Andi Kleen
Robert Walsh <[EMAIL PROTECTED]> writes:
> 
> Any chance we could get these moved into the x86_64 arch directory,
> then?  We have to do double-word copies, or our chip gets unhappy.

Standard memcpy will do double word copies if everything is suitably
aligned. Just use that.

-Andi
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Andi Kleen
[EMAIL PROTECTED] (Eric W. Biederman) writes:

> Christoph Hellwig <[EMAIL PROTECTED]> writes:
> 
> > please always used fixes-size types for user communication.  also please
> > avoid ioctls like the rest of the IB codebase.
> 
> Could someone please explain to me how the uverbs abuse of write
> is better that ioctl?  

It's actually worse because if they have a 32bit compat issue
then ioctl can be fixed up, but read/write can't. 

I wish the people arguing against ioctl all the time would
just stop that because the alternatives are usually worse.
 
> - 64bit compilers will not pad every structure to 8 bytes.  This
>   only will happen if you happen to have an 8 byte element in your
>   structure that is only aligned to 32bits by a 32bit structure.
>   Unfortunately the 32bit gcc only aligns long long to 32bits on
>   x86, which triggers the described behavior.

Exactly - and driver writers usually don't get that right so we
need to have a tool to fix it up in the end. And with ioctl
that's easiest.

-Andi
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Andrew Morton
Robert Walsh <[EMAIL PROTECTED]> wrote:
>
> > > + movl %edx,%ecx
> > > + shrl $1,%ecx
> > > + andl $1,%edx
> > > + cld
> > > + rep 
> > > + movsq 
> > > + movl %edx,%ecx
> > > + rep
> > > + movsd
> > > + ret
> > 
> > err, we have a portability problem.
> 
> Any chance we could get these moved into the x86_64 arch directory,
> then?

That would make sense.  Give it a non-ipath-related name and require that
all architectures which wish to run this driver must implement that
(documented) function.

And, in Kconfig, make sure that architectures which don't implement that
library function do not attempt to build this driver.  To avoid breaking
`make allmodconfig'.

>  We have to do double-word copies, or our chip gets unhappy.

In what form is this chip available?  As a standard PCI/PCIX card which
people will want to plug into power4/ia64/x86 machines?  Or is it in some
way exclusively tied to x86_64?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Andrew Morton
Robert Walsh <[EMAIL PROTECTED]> wrote:
>
> > I'd be inclined to stick BITS_PER_BYTE into include/linux/types.h.
> 
> Really?  I was just going to suggest removing it, but if sticking it in
> types.h works for you, then fine.
> 

I think it's a readbility thing.

x += 8; /* wtf? */

vs

x += BITS_PER_BYTE; /* ah! */
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 07/13] [RFC] ipath core misc files

2005-12-17 Thread Robert Walsh
On Sat, 2005-12-17 at 19:10 -0800, Andrew Morton wrote:
> Robert Walsh <[EMAIL PROTECTED]> wrote:
> >
> > > > +int ipath_mlock(unsigned long start_page, size_t num_pages, struct 
> > > > page **p)
> > > OK.  It's perhaps not a very well named function.
> > 
> > Really?  Suggestion for a better name?
> > 
> 
> ipath_get_user_pages() would cause the least surprise.

Seems reasonable.  I'll look at the related functions, too.

> > > I don't think we want to be setting the user's VMA's vm_flags in this
> > > manner.  This is purely to retain the physical page across fork?
> > 
> > I didn't write this bit of the driver, but I believe this is the case.
> > Is there a better way of doing this?
> 
> This stuff has been churning a bit lately.  I've drawn Hugh Dickins's
> attention to the patch - he'd have a better handle on what the best
> approach would be.

OK then - I'll wait and see.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 07/13] [RFC] ipath core misc files

2005-12-17 Thread Andrew Morton
Robert Walsh <[EMAIL PROTECTED]> wrote:
>
> > > +int ipath_mlock(unsigned long start_page, size_t num_pages, struct page 
> > > **p)
> > OK.  It's perhaps not a very well named function.
> 
> Really?  Suggestion for a better name?
> 

ipath_get_user_pages() would cause the least surprise.

> > > + }
> > > + vm->vm_flags |= VM_SHM | VM_LOCKED;
> > > +
> > > + return 0;
> > > +}
> > 
> > I don't think we want to be setting the user's VMA's vm_flags in this
> > manner.  This is purely to retain the physical page across fork?
> 
> I didn't write this bit of the driver, but I believe this is the case.
> Is there a better way of doing this?

This stuff has been churning a bit lately.  I've drawn Hugh Dickins's
attention to the patch - he'd have a better handle on what the best
approach would be.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [PATCH 13/13] [RFC] ipath Kconfig and Makefile

2005-12-17 Thread Robert Walsh
> There's a difference between "technically supported by the driver" and 
> "officially supported for our costumers":
> 
> It's fine if you tell the costumers buying your hardware "anything else 
> than 64bit x86_64 kernels is completely unsupported", but for getting 
> your driver included into the kernel it should be 32bit clean [1] and 
> should also work for people using 32bit kernels on an Opteron.

Fair enough - I'll see what I can do.

> What surprises me is that -O3 turned out to be the fastest flag for you.
> 
> Can you send numbers comparing -Os/-O2/-O3 (without -g3, preferable with 
> gcc 4.0) including a description what and how you are measuring?

I'll try get around to this after my vacation and after we've had time
to absorb and address all the other feedback we received.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [PATCH 13/13] [RFC] ipath Kconfig and Makefile

2005-12-17 Thread Alan Cox
On Sad, 2005-12-17 at 14:54 -0800, Robert Walsh wrote:
> Agreed about the assembler, but one way or the other, x86_64 is the only
> arch we support.

If you need a quad only copy then put it into asm/string.h (asm/io.h if
its operating on I/O space I guess) or somewhere similar as a generic
function that does just that. That allows people to come along and
provide the same functions for other platforms if they need it, and also
makes it possible for others to use this feature if their hardware has
the same feature.

Nobody expects you as a vendor to support it on sparc64, x86-32 or
whatever, nor to write sparc64 asm functions just to make it possible
for someone to do so cleanly later on.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [PATCH 13/13] [RFC] ipath Kconfig and Makefile

2005-12-17 Thread Adrian Bunk
On Sat, Dec 17, 2005 at 02:54:44PM -0800, Robert Walsh wrote:
> > The driver shouldn't use assembler code and therefore no longer depend 
> > on X86_64.
> 
> Agreed about the assembler, but one way or the other, x86_64 is the only
> arch we support.
>...

There's a difference between "technically supported by the driver" and 
"officially supported for our costumers":

It's fine if you tell the costumers buying your hardware "anything else 
than 64bit x86_64 kernels is completely unsupported", but for getting 
your driver included into the kernel it should be 32bit clean [1] and 
should also work for people using 32bit kernels on an Opteron.

> > -O3 doesn't make much sense since the fight for producing the fastest 
> > code is between -O2 and -Os.
> 
> Makes many nanoseconds of difference to us for our latency numbers.  At
> the low latency numbers we measuring (1.29us), this is a very important
> difference to our customers.
>...

There's no doubt that this is important for your customers.

What surprises me is that -O3 turned out to be the fastest flag for you.

Can you send numbers comparing -Os/-O2/-O3 (without -g3, preferable with 
gcc 4.0) including a description what and how you are measuring?

> Regards,
>  Robert.

cu
Adrian

[1] not long ago, it used to be the other way round that drivers weren't
64bit clean...

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [PATCH 13/13] [RFC] ipath Kconfig and Makefile

2005-12-17 Thread Robert Walsh
> The driver shouldn't use assembler code and therefore no longer depend 
> on X86_64.

Agreed about the assembler, but one way or the other, x86_64 is the only
arch we support.

> -Wall is always set when compiling the kernel.

Fine.

> -O3 doesn't make much sense since the fight for producing the fastest 
> code is between -O2 and -Os.

Makes many nanoseconds of difference to us for our latency numbers.  At
the low latency numbers we measuring (1.29us), this is a very important
difference to our customers.

> You don't want to always compile your driver with -g3.

Good point.  I'll ask around here why we're doing this.

> > +_ipath_idstr:="$$""Id: kernel.org InfiniPath Release 1.1 $$"" $$""Date: 
> > $(shell date +%F-%R)"" $$"
> > +EXTRA_CFLAGS += -D_IPATH_IDSTR='$(_ipath_idstr)' -DIPATH_KERN_TYPE=0
> >...
> 
> Please move the _IPATH_IDSTR revision tag to a header file and remove 
> IPATH_KERN_TYPE.

I'll see what I can do.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Robert Walsh
> and opterons can already run 2 architectures. And the HT bus is a
> generic bus.. with public specs. Others than just AMD use it as well.
> 
> also.. what is wrong with memcpy and co ?

Our chips can only handle double-word writes.  memcpy isn't guaranteed
to do this.

> then you need to use readl() and family most like; they already take
> care of this anyway.

Oh, OK then.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Robert Walsh
On Sat, 2005-12-17 at 12:38 -0800, Andrew Morton wrote:
> Roland Dreier <[EMAIL PROTECTED]> wrote:
> >
> > +   .globl ipath_dwordcpy
> > +/* rdi destination, rsi source, rdx count */
> > +ipath_dwordcpy:
> > +   movl %edx,%ecx
> > +   shrl $1,%ecx
> > +   andl $1,%edx
> > +   cld
> > +   rep 
> > +   movsq 
> > +   movl %edx,%ecx
> > +   rep
> > +   movsd
> > +   ret
> 
> err, we have a portability problem.

Any chance we could get these moved into the x86_64 arch directory,
then?  We have to do double-word copies, or our chip gets unhappy.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Robert Walsh
> > +#define yield() sched_yield()
> 
> Some might get upset about what I assume is userspace test harness code or
> what _is_ this doing?) in a driver.  But if the maintainers find it useful
> we can live with it,

That is cosimulator code.  It's easy enough to remove.  I'll look into
it.

> > +#ifndef _BITS_PER_BYTE
> > +#define _BITS_PER_BYTE 8
> > +#endif
> 
> I'd be inclined to stick BITS_PER_BYTE into include/linux/types.h.

Really?  I was just going to suggest removing it, but if sticking it in
types.h works for you, then fine.

> > +static __inline__ void ipath_shortcopy(void *dest, void *src, uint32_t cnt)
> > +__attribute__ ((always_inline));
> 
> s/__inline__/inline/ throughout.

OK.

> > +#define round_up(v,sz) (((v) + (sz)-1) & ~((sz)-1))
> 
> We have ALIGN()

Yup.

> > +struct ipath_int_vec {
> > +   int long long addr;
> 
> long long

OK.

> > +#define IPATH_USERINIT  _IOW('s', 16, struct ipath_user_info)
> > +/* init;  kernel/chip params to user */
> > +#define IPATH_BASEINFO  _IOR('s', 17, struct ipath_base_info)
> > +/* send a packet */
> > +#define IPATH_SENDPKT   _IOW('s', 18, struct ipath_sendpkt)
> 
> uh-oh.  ioctls.  Do we have compat conversions for them all, if needed?

For those that are needed, I believe we covered them all.  Some have
suggested removing ioctls.  I'm willing to look into alternatives, but
if you think they're OK, I'd rather leave them.

> > +/*
> > + * A segment is a linear region of low physical memory.
> > + * XXX Maybe we should use phys addr here and kmap()/kunmap()
> > + * Used by the verbs layer.
> > + */
> > +struct ipath_seg {
> > +   void *vaddr;
> > +   u64 length;
> > +};
> 
> Suggest `long' for the length.  We don't need 64 bits on 32-bit machines.

OK.

> > +struct ipath_mregion {
> > +   u64 user_base;  /* User's address for this region */
> 
> void *.
> 
> > +   u64 iova;   /* IB start address of this region */
> 
> Maybe here too.

OK.

> > +int ipath_mlock(unsigned long, size_t, struct page **);
> 
> Sometimes it does `int foo()' and sometimes `extern int foo()'.  I tend to
> think the `extern' is a waste of space.

Yup.

> > +#define ipath_func_krecord(a)
> > +#define ipath_func_urecord(a, b)
> > +#define ipath_func_mrecord(a, b)
> > +#define ipath_func_rkrecord(a)
> > +#define ipath_func_rurecord(a, b)
> > +#define ipath_func_rmrecord(a, b)
> > +#define ipath_func_rsrecord(a)
> > +#define ipath_func_rcrecord(a)
> 
> What are all these doing?   Might need do{}while(0) for safety.

I'll look at cleaning them out.  Probably left-overs from some earlier
experiment.

> > +#ifdef IPATH_COSIM
> > +extern __u32 sim_readl(const volatile void __iomem * addr);
> > +extern __u64 sim_readq(const volatile void __iomem * addr);
> 
> The driver has a strange mixture of int32_t, s32 and __s32.  s32 is
> preferred.

Yea - I'll clean that up.

> > + */
> > +static __inline__ uint32_t ipath_kget_ureg32(const ipath_type stype,
> > +ipath_ureg regno, int port)
> > +{
> > +   uint64_t *ubase;
> > +
> > +   ubase = (uint64_t *) (devdata[stype].ipath_uregbase
> > + + (char *)devdata[stype].ipath_kregbase
> > + + devdata[stype].ipath_palign * port);
> > +   return ubase ? ipath_readl(ubase + regno) : 0;
> > +}
> 
> Are all these u64's needed on 32-bit?

Don't know - I'll ask around.  We don't support the hardware in 32-bit
anyway, so...

> > +static __inline__ uint64_t ipath_kget_kreg64(const ipath_type stype,
> > +ipath_kreg regno)
> > +{
> > +   if (!devdata[stype].ipath_kregbase)
> > +   return ~0ULL;
> 
> We don't know that the architecture implements u64 as unsigned long long. 
> Some use unsigned long.  Best way of implmenting the all-ones pattern is
> just `-1'.

OK.

> Gee.  Big driver.

Tell me about it :-)  Basically, we're doing infiniband in software: no
offload.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Arjan van de Ven
On Sat, 2005-12-17 at 14:19 -0800, Robert Walsh wrote:
> > > +{
> > > + void *ssv, *dsv;
> > > + uint32_t csv;
> > > + __asm__ __volatile__("cld\n\trep\n\tmovsb":"=&c"(csv), "=&D"(dsv),
> > > +  "=&S"(ssv)
> > > +  :"0"(cnt), "1"(dest), "2"(src)
> > > +  :"memory");
> > > +}
> > 
> > No way we're gonna put assembler code into such a driver.
> 
> Why not?  The chip (and therefore the driver) only works with Opterons.
> It's tied to the HT bus, but PCI or anything like that.

and opterons can already run 2 architectures. And the HT bus is a
generic bus.. with public specs. Others than just AMD use it as well.

also.. what is wrong with memcpy and co ?

> > > +static __inline__ uint32_t ipath_kget_kreg32(const ipath_type stype,
> > > +  ipath_kreg regno)
> > > +{
> > > + volatile uint32_t *kreg32;
> > > +
> > > + if (!devdata[stype].ipath_kregbase)
> > > + return ~0;
> > > +
> > > + kreg32 = (volatile uint32_t *)&devdata[stype].ipath_kregbase[regno];
> > 
> > volatile use is probably always wrong. but this whole functions looks like
> > a very odd wrapper anyway?
> 
> The volatile is there so the compiler doesn't optimize away the read.
> This is important, because reads of our hardware have side-effects and
> cannot be optimized out.

then you need to use readl() and family most like; they already take
care of this anyway.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Robert Walsh
> > + * $Id: ipath_common.h 4491 2005-12-15 22:20:31Z rjwalsh $
> 
> please remove RCSIDs everywhere.

These are everywhere in the OpenIB code.  I was actually asked by one of
the OpenIB developers to include them.  I'm happy to remove them again,
but what do the OpenIB folks think?

> > +#define yield() sched_yield()
> 
> Please push this out.  It's fine if they reuse kernel-code in userspace
> this way, but please move the compat wrappers to a separate file that's
> not in the kernel tree.  

I will do this.

> > +typedef uint8_t ipath_type;
> 
> totally meaningless typedef

In what way?

> > +#ifndef _BITS_PER_BYTE
> > +#define _BITS_PER_BYTE 8
> > +#endif
> 
> WTF?

Hmm.  That is odd.  I'll ask the folks here if we can remove this.

> in kernel land we __inline__ includes always_inline.  Also no need for
> a separate prototype for a just following inline function.

Fine.

> > +{
> > +   void *ssv, *dsv;
> > +   uint32_t csv;
> > +   __asm__ __volatile__("cld\n\trep\n\tmovsb":"=&c"(csv), "=&D"(dsv),
> > +"=&S"(ssv)
> > +:"0"(cnt), "1"(dest), "2"(src)
> > +:"memory");
> > +}
> 
> No way we're gonna put assembler code into such a driver.

Why not?  The chip (and therefore the driver) only works with Opterons.
It's tied to the HT bus, but PCI or anything like that.

> > +struct ipath_int_vec {
> > +   int long long addr;
> > +   uint32_t info;
> > +};
> 
> 
> please always used fixes-size types for user communication.

OK.

> also please
> avoid ioctls like the rest of the IB codebase.

More complex, but I'll look into it.

> > +/* Similarly, this is the kernel version going back to the user.  It's 
> > slightly
> > + * different, in that we want to tell if the driver was built as part of a
> > + * PathScale release, or from the driver from the OpenIB, kernel.org, or a
> > + * standard distribution, for support reasons.  The high bit is 0 for
> > + * non-PathScale, and 1 for PathScale-built/supplied.  That bit is defined
> > + * in Makefiles, rather than this file.
> > + *
> > + * It's returned by the driver to the user code during initialization
> > + * in the spi_sw_version field of ipath_base_info, so the user code can
> > + * in turn check for compatibility with the kernel.
> > +*/
> > +#define IPATH_KERN_SWVERSION ((IPATH_KERN_TYPE<<31) | IPATH_USER_SWVERSION)
> 
> NACK, there's no way we're gonna put in a way to identify an "official"
> version.  The official version is the last one in mainline always.

Why make this hard for vendors?  You may only care about the latest
mainline, but if we want to sell chips, we have to support this all the
way back to 2.6.9 (RHEL).

> > +#ifndef PCI_VENDOR_ID_PATHSCALE/* not in pci.ids yet */
> > +#define PCI_VENDOR_ID_PATHSCALE 0x1fc1
> > +#define PCI_DEVICE_ID_PATHSCALE_INFINIPATH1 0xa
> > +#define PCI_DEVICE_ID_PATHSCALE_INFINIPATH2 0xd
> > +#endif
> 
> so move it there?

Sounds like a good idea.  I'll submit a separate patch.

> > +typedef struct _ipath_portdata {
> 
> please avoid typedefs for struct types.

I thought I had, but I must have missed this one.

> > +/*
> > + * these should be somewhat dynamic someday, although they are fixed
> > + * for all users of the device on any given load.
> > + *
> > + * NOTE: There is a VM bug in the 2.4 Kernels similar to the one Dave
> > + * fixed in the 2.6 Kernel.  When using large or discontinuous memory,
> > + * we get random kernel oops.  So, in 2.4, we are just going to stick
> > + * with 4k chunks instead of 64k chunks.
> > + */
> 
> No one cares about 2.4 kernels here.

Fine.

> > + * these function similarly to the mlock/munlock system calls.
> > + * ipath_mlock() is used to pin an address range (if not already pinned),
> > + * and optionally return the list of physical addresses
> > + * ipath_munlock() does the obvious, and ipath_mlock() cleans up all 
> > + * private memory, used at driver unload.
> > + * ipath_mlock_nocopy() is similar to mlock, but only one page, and marks
> > + * the vm so the page isn't taken away on a fork.
> > + */
> > +int ipath_mlock(unsigned long, size_t, struct page **);
> > +int ipath_mlock_nocopy(unsigned long, struct page **);
> 
> this kind of thing definitly doesn't belong into an LLDD.  or maybe
> it's just stale prototypes?

No - they're used.  Why do you say they don't belong?

> > +#ifdef IPATH_COSIM
> > +extern __u32 sim_readl(const volatile void __iomem * addr);
> > +extern __u64 sim_readq(const volatile void __iomem * addr);
> > +extern void sim_writel(__u32 val, volatile void __iomem * addr);
> > +extern void sim_writeq(__u64 val, volatile void __iomem * addr);
> > +#define ipath_readl(addr) sim_readl(addr)
> > +#define ipath_readq(addr) sim_readq(addr)
> > +#define ipath_writel(val, addr) sim_writel(val, addr)
> > +#define ipath_writeq(val, addr) sim_writeq(val, addr)
> > +#else
> > +#define ipath_readl(addr) readl(addr)
> > +#define ipath_readq(addr) readq(addr)
> > +#define ipath_writel(val, a

[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Robert Walsh
> Do we really need this ugly userspace emulation code in the kernel?

Probably not.  I'll look into removing it.

> What is this used for? Why can't yo use memcpy?

Our chip can only handle double-word copies.

> > +#define round_up(v,sz) (((v) + (sz)-1) & ~((sz)-1))
> 
> Please use ALIGN().

Fair enough.

> You seem to be introducing loads of new ioctls. Any reason you can't
> use sysfs and/or configfs?

I'll see what people here think of that idea.

> > +#define ips_get_ipath_ver(ipath_header) (((ipath_header) >> 
> > INFINIPATH_I_VERS_SHIFT) \
> > +& INFINIPATH_I_VERS_MASK)
> 
> Please use static inlines instead for readability.

OK.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 13/13] [RFC] ipath Kconfig and Makefile

2005-12-17 Thread Adrian Bunk
On Fri, Dec 16, 2005 at 03:48:55PM -0800, Roland Dreier wrote:

>...
> --- /dev/null
> +++ b/drivers/infiniband/hw/ipath/Kconfig
> @@ -0,0 +1,18 @@
> +config IPATH_CORE
> + tristate "PathScale InfiniPath Driver"
> + depends on PCI_MSI && X86_64
>...

The driver shouldn't use assembler code and therefore no longer depend 
on X86_64.

> --- /dev/null
> +++ b/drivers/infiniband/hw/ipath/Makefile
> @@ -0,0 +1,15 @@
> +EXTRA_CFLAGS += -Idrivers/infiniband/include
> +
> +EXTRA_CFLAGS += -Wall -O3 -g3

-Wall is always set when compiling the kernel.

-O3 doesn't make much sense since the fight for producing the fastest 
code is between -O2 and -Os.

You don't want to always compile your driver with -g3.

> +_ipath_idstr:="$$""Id: kernel.org InfiniPath Release 1.1 $$"" $$""Date: 
> $(shell date +%F-%R)"" $$"
> +EXTRA_CFLAGS += -D_IPATH_IDSTR='$(_ipath_idstr)' -DIPATH_KERN_TYPE=0
>...

Please move the _IPATH_IDSTR revision tag to a header file and remove 
IPATH_KERN_TYPE.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Eric W. Biederman
Christoph Hellwig <[EMAIL PROTECTED]> writes:

> please always used fixes-size types for user communication.  also please
> avoid ioctls like the rest of the IB codebase.

Could someone please explain to me how the uverbs abuse of write
is better that ioctl?  

Every single command seems to have a __u64 response fields that is a
pointer into user space.  When you write your commands and read your
responses like the netlink layer does I can see the sense of it.  But
making write an ioctl by another name...

One of the scarier comments I have seen lately from ib_user_verbs.h
/*
 * Make sure that all structs defined in this file remain laid out so
 * that they pack the same way on 32-bit and 64-bit architectures (to
 * avoid incompatibility between 32-bit userspace and 64-bit kernels).
 * Specifically:
 *  - Do not use pointer types -- pass pointers in __u64 instead.
 *  - Make sure that any structure larger than 4 bytes is padded to a
 *multiple of 8 bytes.  Otherwise the structure size will be
 *different between 32-bit and 64-bit architectures.
 */

The two points that get called out.
- Embedded pointers are a large part of what make ioctl a maintenance
  nightmare.  I admit we are 15-20 years away before big machines
  exhaust the capability of 64bit pointers so we aren't likely
  to run into size issues soon.  But a write that changes your address
  space is ugly, and unexpected.

  What looks like a reimplementation of readv/writev using this
  technique is also scary.

- 64bit compilers will not pad every structure to 8 bytes.  This
  only will happen if you happen to have an 8 byte element in your
  structure that is only aligned to 32bits by a 32bit structure.
  Unfortunately the 32bit gcc only aligns long long to 32bits on
  x86, which triggers the described behavior.



Eric
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Robert Walsh
On Sat, 2005-12-17 at 14:38 +0200, Pekka Enberg wrote:
> On 12/17/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
> > +#define TRUE 1
> > +#define FALSE 0
> 
> Please kill these.

OK.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 04/13] [RFC] ipath LLD core, part 1

2005-12-17 Thread Robert Walsh
On Sat, 2005-12-17 at 12:38 -0800, Andrew Morton wrote:
> Roland Dreier <[EMAIL PROTECTED]> wrote:
> >
> > +   if ((ret = copy_from_user(&rpkt, p, sizeof rpkt))) {
> > +   _IPATH_DBG("Failed to copy in pkt struct (%d)\n", ret);
> > +   return ret;
> > +   }
> 
> The driver does this quite a lot.  copy_from_user() will return the number
> of bytes remaining to copy.  So I think we'll be wanting `return -EFAULT;'
> in lots of places rather than this.

Thanks.  Will fix.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 07/13] [RFC] ipath core misc files

2005-12-17 Thread Robert Walsh
> > +int ipath_mlock(unsigned long start_page, size_t num_pages, struct page 
> > **p)
> OK.  It's perhaps not a very well named function.

Really?  Suggestion for a better name?

> > +   }
> > +   vm->vm_flags |= VM_SHM | VM_LOCKED;
> > +
> > +   return 0;
> > +}
> 
> I don't think we want to be setting the user's VMA's vm_flags in this
> manner.  This is purely to retain the physical page across fork?

I didn't write this bit of the driver, but I believe this is the case.
Is there a better way of doing this?

> > +int ipath_munlock(size_t num_pages, struct page **p)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < num_pages; i++) {
> > +   _IPATH_MMDBG("%u/%lu put_page %p\n", i, num_pages, p[i]);
> > +   SetPageDirty(p[i]);
> > +   put_page(p[i]);
> > +   }
> > +   return 0;
> > +}
> 
> Nope, SetPageDirty() doesn't tell the VM that the page is dirty - it'll
> never get written out.  Use set_page_dirty_lock().

OK.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 07/13] [RFC] ipath core misc files

2005-12-17 Thread Robert Walsh
> > +void ipath_init_picotime(void)
> > +{
> > +   int i;
> > +   u_int64_t ts, te, delta = -1ULL;
> > +
> > +   for (i = 0; i < 5; i++) {
> > +   ts = get_cycles();
> > +   udelay(250);
> > +   te = get_cycles();
> > +   if ((te - ts) < delta)
> > +   delta = te - ts;
> > +   yield();
> > +   }
> > +   _ipath_pico_per_cycle = 25000 / delta;
> > +}
> 
> hm, I hope this is debug code which is going away.  If not, we should take
> a look at what it's trying to do here.

This isn't debug code.  It's used to calculate the roughly number
picoseconds per cycle.  This is used in the driver to make sure
HyperTransport reads haven't timed out (see ipath_snap_cntr in
ipath_driver.c) when reading chip counters.  If you can think of a
better way of figuring out whether a read took greater than a certain
length of time, I'd be interested in knowing it.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [git patch review 2/7] IB/mthca: correct log2 calculation

2005-12-17 Thread Michael S. Tsirkin
Quoting r. Andrew Morton <[EMAIL PROTECTED]>:
> Subject: Re: [git patch review 2/7] IB/mthca: correct log2 calculation
> 
> Roland Dreier <[EMAIL PROTECTED]> wrote:
> >
> > Fix thinko in rd_atomic calculation: ffs(x) - 1 does not find the next
> > power of 2 -- it should be fls(x - 1).
> 
> Please use round_up_pow_of_two().

Yes, but we want the bit number. roundup_pow_of_two does a shift.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 08/13] [RFC] ipath core last bit

2005-12-17 Thread Andrew Morton
Roland Dreier <[EMAIL PROTECTED]> wrote:
>
> +EXPORT_SYMBOL(ipath_kset_linkstate);
> +EXPORT_SYMBOL(ipath_kset_mtu);
> +EXPORT_SYMBOL(ipath_layer_close);
> +EXPORT_SYMBOL(ipath_layer_get_bcast);
> +EXPORT_SYMBOL(ipath_layer_get_cr_errpkey);
> +EXPORT_SYMBOL(ipath_layer_get_deviceid);
> +EXPORT_SYMBOL(ipath_layer_get_flags);
> +EXPORT_SYMBOL(ipath_layer_get_guid);
> +EXPORT_SYMBOL(ipath_layer_get_ibmtu);
> etc

EXPORT_SMBOL_GPL?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 07/13] [RFC] ipath core misc files

2005-12-17 Thread Andrew Morton
Roland Dreier <[EMAIL PROTECTED]> wrote:
>
> ...
> +/*
> + * This isn't perfect, but it's close enough for timing work. We want this
> + * to work on systems where the cycle counter isn't the same as the clock
> + * frequency.  The one msec spin is OK, since we execute this only once
> + * when first loaded.  We don't use CURRENT_TIME because on some systems
> + * it only has jiffy resolution; we just assume udelay is well calibrated
> + * and that we aren't likely to be rescheduled.  Do it multiple times,
> + * with a yield in between, to try to make sure we get the "true minimum"
> + * value.
> + * _ipath_pico_per_cycle isn't going to lead to completely accurate
> + * conversions from timestamps to nanoseconds, but it's close enough
> + * for our purposes, which is mainly to allow people to show events with
> + * nsecs or usecs if desired, rather than cycles.
> + */
> +void ipath_init_picotime(void)
> +{
> + int i;
> + u_int64_t ts, te, delta = -1ULL;
> +
> + for (i = 0; i < 5; i++) {
> + ts = get_cycles();
> + udelay(250);
> + te = get_cycles();
> + if ((te - ts) < delta)
> + delta = te - ts;
> + yield();
> + }
> + _ipath_pico_per_cycle = 25000 / delta;
> +}

hm, I hope this is debug code which is going away.  If not, we should take
a look at what it's trying to do here.


> +/*
> + * Our version of the kernel mlock function.  This function is no longer
> + * exposed, so we need to do it ourselves.  It takes a given start page
> + * (page aligned user virtual address) and pins it and the following 
> specified
> + * number of pages.
> + * For now, num_pages is always 1, but that will probably change at some
> + * point (because caller is doing expected sends on a single virtually
> + * contiguous buffer, so we can do all pages at once).
> + */
> +int ipath_mlock(unsigned long start_page, size_t num_pages, struct page **p)
> +{
> + int n;
> +
> + _IPATH_VDBG("pin %lx pages from vaddr %lx\n", num_pages, start_page);
> + down_read(¤t->mm->mmap_sem);
> + n = get_user_pages(current, current->mm, start_page, num_pages, 1, 1,
> +p, NULL);
> + up_read(¤t->mm->mmap_sem);
> + if (n != num_pages) {
> + _IPATH_INFO
> + ("get_user_pages (0x%lx pages starting at 0x%lx failed with 
> %d\n",
> +  num_pages, start_page, n);
> + if (n < 0)  /* it's an errno */
> + return n;
> + return -ENOMEM; /* no way to know actual error */
> + }
> +
> + return 0;
> +}

OK.  It's perhaps not a very well named function.

> +/*
> + * this is similar to ipath_mlock, but it's always one page, and we mark
> + * the page as locked for i/o, and shared.  This is used for the user process
> + * page that contains the destination address for the rcvhdrq tail update,
> + * so we need to have the vma.  If we don't do this, the page can be taken
> + * away from us on fork, even if the child never touches it, and then
> + * the user process never sees the tail register updates.
> + */
> +int ipath_mlock_nocopy(unsigned long start_page, struct page **p)
> +{
> + int n;
> + struct vm_area_struct *vm = NULL;
> +
> + down_read(¤t->mm->mmap_sem);
> + n = get_user_pages(current, current->mm, start_page, 1, 1, 1, p, &vm);
> + up_read(¤t->mm->mmap_sem);
> + if (n != 1) {
> + _IPATH_INFO("get_user_pages for 0x%lx failed with %d\n",
> + start_page, n);
> + if (n < 0)  /* it's an errno */
> + return n;
> + return -ENOMEM; /* no way to know actual error */
> + }
> + vm->vm_flags |= VM_SHM | VM_LOCKED;
> +
> + return 0;
> +}

I don't think we want to be setting the user's VMA's vm_flags in this
manner.  This is purely to retain the physical page across fork?

> +/*
> + * Our version of the kernel munlock function.  This function is no longer
> + * exposed, so we need to do it ourselves.  It unpins the start page
> + * (a page aligned full user virtual address, not a page number)
> + * and pins it and the following specified number of pages.
> + */
> +int ipath_munlock(size_t num_pages, struct page **p)
> +{
> + int i;
> +
> + for (i = 0; i < num_pages; i++) {
> + _IPATH_MMDBG("%u/%lu put_page %p\n", i, num_pages, p[i]);
> + SetPageDirty(p[i]);
> + put_page(p[i]);
> + }
> + return 0;
> +}

Nope, SetPageDirty() doesn't tell the VM that the page is dirty - it'll
never get written out.  Use set_page_dirty_lock().


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 04/13] [RFC] ipath LLD core, part 1

2005-12-17 Thread Andrew Morton
Roland Dreier <[EMAIL PROTECTED]> wrote:
>
> + if ((ret = copy_from_user(&rpkt, p, sizeof rpkt))) {
> + _IPATH_DBG("Failed to copy in pkt struct (%d)\n", ret);
> + return ret;
> + }

The driver does this quite a lot.  copy_from_user() will return the number
of bytes remaining to copy.  So I think we'll be wanting `return -EFAULT;'
in lots of places rather than this.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Andrew Morton
Roland Dreier <[EMAIL PROTECTED]> wrote:
>
> + .globl ipath_dwordcpy
> +/* rdi   destination, rsi source, rdx count */
> +ipath_dwordcpy:
> + movl %edx,%ecx
> + shrl $1,%ecx
> + andl $1,%edx
> + cld
> + rep 
> + movsq 
> + movl %edx,%ecx
> + rep
> + movsd
> + ret

err, we have a portability problem.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Andrew Morton
Roland Dreier <[EMAIL PROTECTED]> wrote:
>
> ...
>
> +#ifdef __KERNEL__
> +#include 
> +#include 
> +#include 
> +#else/* !__KERNEL__; user mode */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* these aren't implemented for user mode, which is OK until we multi-thread 
> */
> +typedef struct _atomic {
> + uint32_t counter;
> +} atomic_t;  /* no atomic_t type in user-land */
> +#define atomic_set(a,v) ((a)->counter = (v))
> +#define atomic_inc_return(a)  (++(a)->counter)
> +#define likely(x) (x)
> +#define unlikely(x) (x)
> +
> +#define yield() sched_yield()

Some might get upset about what I assume is userspace test harness code or
what _is_ this doing?) in a driver.  But if the maintainers find it useful
we can live with it,

> +#ifndef _BITS_PER_BYTE
> +#define _BITS_PER_BYTE 8
> +#endif

I'd be inclined to stick BITS_PER_BYTE into include/linux/types.h.

> +static __inline__ void ipath_shortcopy(void *dest, void *src, uint32_t cnt)
> +__attribute__ ((always_inline));

s/__inline__/inline/ throughout.

> +#define round_up(v,sz) (((v) + (sz)-1) & ~((sz)-1))

We have ALIGN()

> +struct ipath_int_vec {
> + int long long addr;

long long

> + uint32_t info;
> +};
> +struct ipath_eeprom_req {
> + long long addr;

Like this.

> + uint16_t len;
> + uint16_t offset;
> +};
> ...
> +#define IPATH_USERINIT  _IOW('s', 16, struct ipath_user_info)
> +/* init;  kernel/chip params to user */
> +#define IPATH_BASEINFO  _IOR('s', 17, struct ipath_base_info)
> +/* send a packet */
> +#define IPATH_SENDPKT   _IOW('s', 18, struct ipath_sendpkt)

uh-oh.  ioctls.  Do we have compat conversions for them all, if needed?

> +/*
> + * A segment is a linear region of low physical memory.
> + * XXX Maybe we should use phys addr here and kmap()/kunmap()
> + * Used by the verbs layer.
> + */
> +struct ipath_seg {
> + void *vaddr;
> + u64 length;
> +};

Suggest `long' for the length.  We don't need 64 bits on 32-bit machines.

> +struct ipath_mregion {
> + u64 user_base;  /* User's address for this region */

void *.

> + u64 iova;   /* IB start address of this region */

Maybe here too.

> +int ipath_mlock(unsigned long, size_t, struct page **);

Sometimes it does `int foo()' and sometimes `extern int foo()'.  I tend to
think the `extern' is a waste of space.

> +#define ipath_func_krecord(a)
> +#define ipath_func_urecord(a, b)
> +#define ipath_func_mrecord(a, b)
> +#define ipath_func_rkrecord(a)
> +#define ipath_func_rurecord(a, b)
> +#define ipath_func_rmrecord(a, b)
> +#define ipath_func_rsrecord(a)
> +#define ipath_func_rcrecord(a)

What are all these doing?   Might need do{}while(0) for safety.

> +#ifdef IPATH_COSIM
> +extern __u32 sim_readl(const volatile void __iomem * addr);
> +extern __u64 sim_readq(const volatile void __iomem * addr);

The driver has a strange mixture of int32_t, s32 and __s32.  s32 is
preferred.

> + */
> +static __inline__ uint32_t ipath_kget_ureg32(const ipath_type stype,
> +  ipath_ureg regno, int port)
> +{
> + uint64_t *ubase;
> +
> + ubase = (uint64_t *) (devdata[stype].ipath_uregbase
> +   + (char *)devdata[stype].ipath_kregbase
> +   + devdata[stype].ipath_palign * port);
> + return ubase ? ipath_readl(ubase + regno) : 0;
> +}

Are all these u64's needed on 32-bit?

> +static __inline__ uint64_t ipath_kget_kreg64(const ipath_type stype,
> +  ipath_kreg regno)
> +{
> + if (!devdata[stype].ipath_kregbase)
> + return ~0ULL;

We don't know that the architecture implements u64 as unsigned long long. 
Some use unsigned long.  Best way of implmenting the all-ones pattern is
just `-1'.



Gee.  Big driver.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [git patch review 2/7] IB/mthca: correct log2 calculation

2005-12-17 Thread Andrew Morton
Roland Dreier <[EMAIL PROTECTED]> wrote:
>
> Fix thinko in rd_atomic calculation: ffs(x) - 1 does not find the next
> power of 2 -- it should be fls(x - 1).

Please use round_up_pow_of_two().
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [RFC] IB_AT_MOST

2005-12-17 Thread Michael S. Tsirkin
Quoting r. Fab Tillier <[EMAIL PROTECTED]>:
> Subject: RE: [RFC] IB_AT_MOST
> 
> Hi Michael,
> 
> > From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED]
> > Sent: Friday, December 16, 2005 5:58 AM
> > 
> > Hi!
> > I recently noted that some middleware seems to use the "as much
> > as possible" approach, for example, using maximum possible value
> > for max_rd_atomic or other fields, in create/modify qp.
> > 
> > An obvious thing could be to perform query_device and use max.
> > values from there. However, it turns out that hardware max supported
> > values might not be easy to express in terms of a single constant.
> > Consider for example the max number of s/g entries supported per
> > WQE: mellanox HCAs support different number of these for RC and UD
> > QPs. So whatever single number query device reports, using it will
> > never achieve what the user wants for all QP types.
> > 
> > Rather than extending the device query for all thinkable hardware
> > weirdness, I'd like to propose, instead, the following API extension
> > (below): passing a negative value in e.g. qp attribute would have the
> > meaning: let hardware use at most the specified value.
> > This, as opposed to the usual "at least the specified value" meaning
> > for positive values.
> > 
> > How does the following work, for an API? Please comment.
> 
> I don't understand the IB_AT_MOST macro.  If someone uses IB_AT_MOST( 1) and
> the hardware supports 4, they will get 4, which is definitely not "at most 1".

Yes, but we could easily fix this in the hardware provider so that they get 1.

> I would rename it to IB_MAX, and define it a -1 or something like that.

This is an option, too.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 00/13] [RFC] IB: PathScale InfiniPath driver

2005-12-17 Thread Roland Dreier
Christoph> Is there some political plot going where pathscale
Christoph> folks are forcing you to send this out in this scheme?
Christoph> Otherwise I couldn't explain the code quality
Christoph> magnitudes lower than normally expected from your
Christoph> merges.

No political plot -- this posting was an RFC in the literal sense,
with no expectation that the code is mergable as-is.  I just want to
get comments early so that we have a better idea of what needs to be
fixed.  For example, what's your feeling about sysctls in drivers?

BTW, Pathscale people -- please respond to the comments that are made
about your driver...

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: SDP problem on SVN 4507

2005-12-17 Thread Michael S. Tsirkin
Quoting r. Bob Woodruff <[EMAIL PROTECTED]>:
> Subject: SDP problem on SVN 4507
> 
>  
> I am seeing a strange problem with SDP on SVN4507. 
> 
> When I run NetPIPE over SDP by itself, it runs just fine.
> However, if I run MPI over uDAPL/CMA at the same
> time, I seem to have a problem.
> I start 2 copies of MPI running Intel MPI benchmark. Then,
> I start the NetPIPE server and it starts to listen waiting for a connect
> request.
> Then I start the client side and it fails (on the connect() call) with
> an
> errno of 111. If I then stop the MPI/uDAPL/CMA jobs,
> SDP/NetPIPE can then connect OK. Not sure if this is an SDP issue 
> or some problem with CMA/CM. 
> 
> Has anyone else seen a similar behavior ?
> 
> 
> woody
> 

I hope this will be resolved with the move to CMA.


-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Christoph Hellwig
On Fri, Dec 16, 2005 at 03:48:54PM -0800, Roland Dreier wrote:
> Copy routines for ipath driver

NACK, assembler copy routines don't belong into drivers.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 00/13] [RFC] IB: PathScale InfiniPath driver

2005-12-17 Thread Christoph Hellwig
On Fri, Dec 16, 2005 at 03:48:54PM -0800, Roland Dreier wrote:
> having sysctls that set values also settable through module parameters
> under /sys/module, code inside #ifndef __KERNEL__ so include files can
> be shared with other PathScale code, code in ipath_i2c.c that might be
> simplified by using drivers/i2c, etc.  I'd like to try to get a sense
> of whether I'm being too picky or whether PathScale really does need
> to fix these up before the driver is merged.

Yes, please fix this stuff before.  The current driver looks like a horrible
mess.

Is there some political plot going where pathscale folks are forcing you to
send this out in this scheme?  Otherwise I couldn't explain the code quality
magnitudes lower than normally expected from your merges.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Christoph Hellwig
> + * $Id: ipath_common.h 4491 2005-12-15 22:20:31Z rjwalsh $

please remove RCSIDs everywhere.

> +#ifdef __KERNEL__
> +#include 
> +#include 
> +#include 
> +#else/* !__KERNEL__; user mode */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* these aren't implemented for user mode, which is OK until we multi-thread 
> */
> +typedef struct _atomic {
> + uint32_t counter;
> +} atomic_t;  /* no atomic_t type in user-land */
> +#define atomic_set(a,v) ((a)->counter = (v))
> +#define atomic_inc_return(a)  (++(a)->counter)
> +#define likely(x) (x)
> +#define unlikely(x) (x)
> +
> +#define yield() sched_yield()

Please push this out.  It's fine if they reuse kernel-code in userspace
this way, but please move the compat wrappers to a separate file that's
not in the kernel tree.  

> +typedef uint8_t ipath_type;

totally meaningless typedef

> +#ifndef _BITS_PER_BYTE
> +#define _BITS_PER_BYTE 8
> +#endif

WTF?

> +
> +static __inline__ void ipath_shortcopy(void *dest, void *src, uint32_t cnt)
> +__attribute__ ((always_inline));



> +/*
> + * this is used for very short copies, usually 1 - 8 bytes,
> + * *NEVER* to the PIO buffers!!!  use ipath_dwordcpy for longer
> + * copies, or any copy to the PIO buffers.  Works for 32 and 64 bit
> + * gcc and pathcc
> + */
> +static __inline__ void ipath_shortcopy(void *dest, void *src, uint32_t cnt)

in kernel land we __inline__ includes always_inline.  Also no need for
a separate prototype for a just following inline function.

> +{
> + void *ssv, *dsv;
> + uint32_t csv;
> + __asm__ __volatile__("cld\n\trep\n\tmovsb":"=&c"(csv), "=&D"(dsv),
> +  "=&S"(ssv)
> +  :"0"(cnt), "1"(dest), "2"(src)
> +  :"memory");
> +}

No way we're gonna put assembler code into such a driver.

> +struct ipath_int_vec {
> + int long long addr;
> + uint32_t info;
> +};


please always used fixes-size types for user communication.  also please
avoid ioctls like the rest of the IB codebase.

> +/* Similarly, this is the kernel version going back to the user.  It's 
> slightly
> + * different, in that we want to tell if the driver was built as part of a
> + * PathScale release, or from the driver from the OpenIB, kernel.org, or a
> + * standard distribution, for support reasons.  The high bit is 0 for
> + * non-PathScale, and 1 for PathScale-built/supplied.  That bit is defined
> + * in Makefiles, rather than this file.
> + *
> + * It's returned by the driver to the user code during initialization
> + * in the spi_sw_version field of ipath_base_info, so the user code can
> + * in turn check for compatibility with the kernel.
> +*/
> +#define IPATH_KERN_SWVERSION ((IPATH_KERN_TYPE<<31) | IPATH_USER_SWVERSION)

NACK, there's no way we're gonna put in a way to identify an "official"
version.  The official version is the last one in mainline always.

> +#ifndef PCI_VENDOR_ID_PATHSCALE  /* not in pci.ids yet */
> +#define PCI_VENDOR_ID_PATHSCALE 0x1fc1
> +#define PCI_DEVICE_ID_PATHSCALE_INFINIPATH1 0xa
> +#define PCI_DEVICE_ID_PATHSCALE_INFINIPATH2 0xd
> +#endif

so move it there?

> +typedef struct _ipath_portdata {

please avoid typedefs for struct types.

> +/*
> + * these should be somewhat dynamic someday, although they are fixed
> + * for all users of the device on any given load.
> + *
> + * NOTE: There is a VM bug in the 2.4 Kernels similar to the one Dave
> + * fixed in the 2.6 Kernel.  When using large or discontinuous memory,
> + * we get random kernel oops.  So, in 2.4, we are just going to stick
> + * with 4k chunks instead of 64k chunks.
> + */

No one cares about 2.4 kernels here.

> + * these function similarly to the mlock/munlock system calls.
> + * ipath_mlock() is used to pin an address range (if not already pinned),
> + * and optionally return the list of physical addresses
> + * ipath_munlock() does the obvious, and ipath_mlock() cleans up all 
> + * private memory, used at driver unload.
> + * ipath_mlock_nocopy() is similar to mlock, but only one page, and marks
> + * the vm so the page isn't taken away on a fork.
> + */
> +int ipath_mlock(unsigned long, size_t, struct page **);
> +int ipath_mlock_nocopy(unsigned long, struct page **);

this kind of thing definitly doesn't belong into an LLDD.  or maybe
it's just stale prototypes?

> +#ifdef IPATH_COSIM
> +extern __u32 sim_readl(const volatile void __iomem * addr);
> +extern __u64 sim_readq(const volatile void __iomem * addr);
> +extern void sim_writel(__u32 val, volatile void __iomem * addr);
> +extern void sim_writeq(__u64 val, volatile void __iomem * addr);
> +#define ipath_readl(addr) sim_readl(addr)
> +#define ipath_readq(addr) sim_readq(addr)
> +#define ipath_writel(val, addr) sim_writel(val, addr)
> +#define ipath_writeq(val, addr) sim_writeq(val, addr)
> +#else
> +#define ipath_readl(addr) readl(addr)
> +#define ipath_readq(addr) readq(addr)
> +#define ipath_

[openib-general] Re: [PATCH 03/13] [RFC] ipath copy routines

2005-12-17 Thread Pekka Enberg
On 12/17/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
> +#define TRUE 1
> +#define FALSE 0

Please kill these.

 Pekka
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 01/13] [RFC] ipath basic headers

2005-12-17 Thread Pekka Enberg
Hi Roland,

On 12/17/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
> +/*
> + * This file contains defines, structures, etc. that are used
> + * to communicate between kernel and user code.
> + */
> +
> +#ifdef __KERNEL__
> +#include 
> +#include 
> +#include 
> +#else  /* !__KERNEL__; user mode */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* these aren't implemented for user mode, which is OK until we multi-thread 
> */
> +typedef struct _atomic {
> +   uint32_t counter;
> +} atomic_t;/* no atomic_t type in user-land */
> +#define atomic_set(a,v) ((a)->counter = (v))
> +#define atomic_inc_return(a)  (++(a)->counter)
> +#define likely(x) (x)
> +#define unlikely(x) (x)
> +
> +#define yield() sched_yield()
> +
> +/*
> + * too horrible to try and use the kernel get_cycles() or equivalent,
> + * so define and inline it here
> + */
> +
> +#if !defined(rdtscll)
> +#if defined(__x86_64) || defined(__i386)
> +#define rdtscll(v) do {uint32_t a,d;asm volatile("rdtsc" : "=a" (a), "=d" 
> (d)); \
> + (v) = ((uint64_t)a) | (((uint64_t)d)<<32); \
> +} while(0)
> +#else
> +#error "No cycle counter routine implemented yet for this platform"
> +#endif
> +#endif /* !defined(rdtscll) */

Do we really need this ugly userspace emulation code in the kernel?

> +/*
> + * this is used for very short copies, usually 1 - 8 bytes,
> + * *NEVER* to the PIO buffers!!!  use ipath_dwordcpy for longer
> + * copies, or any copy to the PIO buffers.  Works for 32 and 64 bit
> + * gcc and pathcc
> + */
> +static __inline__ void ipath_shortcopy(void *dest, void *src, uint32_t cnt)
> +{
> +   void *ssv, *dsv;
> +   uint32_t csv;
> +   __asm__ __volatile__("cld\n\trep\n\tmovsb":"=&c"(csv), "=&D"(dsv),
> +"=&S"(ssv)
> +:"0"(cnt), "1"(dest), "2"(src)
> +:"memory");
> +}
> +
> +/*
> + * optimized word copy; good for rev C and later opterons.  Among the best 
> for
> + * short copies, and does as well or slightly better than the optimizization
> + * guide copies 6 and 8 at 2KB.
> + */
> +void ipath_dwordcpy(uint32_t * dest, uint32_t * src, uint32_t ndwords);

What is this used for? Why can't yo use memcpy?

> +#define round_up(v,sz) (((v) + (sz)-1) & ~((sz)-1))

Please use ALIGN().

> +/* These are used in the driver, don't use them elsewhere */
> +#define _IPATH_SIMFUNC_IOCTL_LOW 1
> +#define _IPATH_SIMFUNC_IOCTL_HIGH 7
> +
> +/*
> + * These tell the driver which ioctl's belong to the diags interface.
> + * As above, don't use them elsewhere.
> + */
> +#define _IPATH_DIAG_IOCTL_LOW 100
> +#define _IPATH_DIAG_IOCTL_HIGH 109

[snip, snip]

You seem to be introducing loads of new ioctls. Any reason you can't
use sysfs and/or configfs?

> +/* macros for processing rcvhdrq entries */
> +#define ips_get_hdr_err_flags(StartOfBuffer) *(((uint32_t 
> *)(StartOfBuffer))+1)
> +#define ips_get_index(StartOfBuffer) (((*((uint32_t *)(StartOfBuffer))) >> \
> +INFINIPATH_RHF_EGRINDEX_SHIFT) & INFINIPATH_RHF_EGRINDEX_MASK)
> +#define ips_get_rcv_type(StartOfBuffer)  ((*(((uint32_t *)(StartOfBuffer))) 
> >> \
> +INFINIPATH_RHF_RCVTYPE_SHIFT) & INFINIPATH_RHF_RCVTYPE_MASK)
> +#define ips_get_length_in_bytes(StartOfBuffer) \
> +(uint32_t)(((*(((uint32_t *)(StartOfBuffer))) >> \
> +INFINIPATH_RHF_LENGTH_SHIFT) & INFINIPATH_RHF_LENGTH_MASK) << 2)
> +#define ips_get_first_protocol_header(StartOfBuffer) (void *) \
> +((uint32_t *)(StartOfBuffer) + 2)
> +#define ips_get_ips_header(StartOfBuffer) ((ips_message_header_typ *) \
> +((uint32_t *)(StartOfBuffer) + 2))
> +#define ips_get_ipath_ver(ipath_header) (((ipath_header) >> 
> INFINIPATH_I_VERS_SHIFT) \
> +& INFINIPATH_I_VERS_MASK)

Please use static inlines instead for readability.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [kDAPL]questions about the LMR creation of different types of memory

2005-12-17 Thread Ian Jiang
Hi Grant,
Thanks very much.
I scanned the IO-mapping.txt, DMA-API.txt and DMA-mapping.txt as soon as I could and have got a main concept now.
As your mentioned, ULPs in OpenIB (e.g. SDP or IPoIB) are responsible
for properly mapping and unmapping for DMA use. AFAIK, SDP is
implemented with the IB native verbs. What about the kDAPL? In my
opinion the kDAPL does not do the mapping and unmapping work. So it is
the responsibility of the kernel applications using the kDAPL. Am I
right?On 12/17/05, Grant Grundler <[EMAIL PROTECTED]> wrote:
While IO-mapping.txt gives a nice introduction into the topicof "bus addresses", the answer to the question lies inDocumentation/DMA-API.txt. IO devices can only use "bus addresses"that are handed back by the interfaces described in 
DMA-API.txt.For OpenIB, ULPs (e.g. SDP or IPoIB) are responsible for properlymapping and unmapping for DMA use.While many architectures don't use IOMMU (and thus have 1:1between host physical:bus address), virtualization seems to be
forcing the issue in the "near" future. All DMA access will needto be enforced to isolate virtualized guests. This is somethingsome platforms with IOMMUs enforce today (e.g. Sparc64, PPC64 and PA-RISC).
hth,grant-- Ian Jiang[EMAIL PROTECTED]Laboratory of Spatial Information TechnologyDivision of System Architecture
Institute of Computing TechnologyChinese Academy of Sciences
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general