RE: [PATCH v2 00/14] RDS: connection scalability and performance improvements

2015-10-01 Thread David Laight
From: Santosh Shilimkar
> Sent: 30 September 2015 18:24
...
> This is being addressed by simply using per bucket rw lock which makes the
> locking simple and very efficient. The hash table size is still an issue and
> I plan to address it by using re-sizable hash tables as suggested on the list.

If the hash chains are short do you need the expense of a rw lock
for each chain?
A simple spinlock may be faster.

If you use the hash chain lock for the reference count on the hashed
objects you should be able to release the lock before locking the
object itself.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/10] use safer test on the result of find_first_zero_bit

2014-06-04 Thread David Laight
From: Julia Lawall
> On Wed, 4 Jun 2014, Geert Uytterhoeven wrote:
> 
> > Hi Julia,
> >
> > On Wed, Jun 4, 2014 at 1:00 PM, Julia Lawall  wrote:
> > > OK, thanks.  I was only looking at the C code.
> > >
> > > But the C code contains a loop that is followed by:
> > >
> > > if (!size)
> > > return result;
> > > tmp = *p;
> > >
> > > found_first:
> > > tmp |= ~0UL << size;
> > > if (tmp == ~0UL)/* Are any bits zero? */
> > > return result + size;   /* Nope. */
> > >
> > > In the first return, it would seem that result == size.  Could the second
> > > one be changed to just return size?  It should not hurt performance.
> >
> > "size" may have been changed between function entry and this line.
> > So you have to store it in a temporary.
> 
> Sorry, after reflection it seems that indeed size + result is always the
> original size, so it is actually all of the code that uses >= that is
> doing something unnecessary.  == for the failure test is fine.

There is nothing wrong with defensive coding.
The 'tmp |= ~0UL << size' ensures that the return value is 'correct'
when there are no bits set.
The function could have been defined so that this wasn't needed.

If you assume that the 'no zero bits' is unlikely, then checking the
return value from ffz() could well be slightly faster.
Not that anything is likely to notice.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/10] use safer test on the result of find_first_zero_bit

2014-06-04 Thread David Laight
From: Julia Lawall
> On Wed, 4 Jun 2014, Geert Uytterhoeven wrote:
> 
> > Hi Julia,
> >
> > On Wed, Jun 4, 2014 at 11:07 AM, Julia Lawall  wrote:
> > > Find_first_zero_bit considers BITS_PER_LONG bits at a time, and thus may
> > > return a larger number than the maximum position argument if that position
> > > is not a multiple of BITS_PER_LONG.
> >
> > Shouldn't this be fixed in find_first_zero_bit() instead?
> 
> OK, I could do that as well.  Most of the callers currently test with >=.
> Should they be left as is, or changed to use ==?

Do we want to add an extra test to find_first_zero_bit() and effectively
slow down all the calls - especially those where the length is a
multiple of 8 (probably the most common).

Maybe the documented return code should be changed to allow for the
existing behaviour.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [patch] RDMA/cxgb4: info leak in c4iw_alloc_ucontext()

2014-03-28 Thread David Laight
From: Yann Droneaud
> Hi,
> 
> Le vendredi 28 mars 2014  11:24 +0300, Dan Carpenter a crit :
> > The c4iw_alloc_ucontext_resp struct has a 4 byte hole after the last
> > member and we should clear it before passing it to the user.
> >
> > Fixes: 05eb23893c2c ('cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes')
> > Signed-off-by: Dan Carpenter 
> >
> 
> It's not the proper fix for this issue: an explicit padding has to be
> added (and initialized), see "Re: [PATCH net-next 2/2] cxgb4/iw_cxgb4:
> Doorbell Drop Avoidance Bug Fixes"
> http://marc.info/?i=1395848977.3297.15.camel@localhost.localdomain
> 
> In its current form, the c4iw_alloc_ucontext_resp structure does not
> require padding on i386, so a 32bits userspace program using this
> structure against a x86_64 kernel will make the kernel do a buffer
> overflow in userspace, likely on stack, as answer of a GET_CONTEXT
> request:
...
> struct c4iw_alloc_ucontext_resp {
> struct ibv_get_context_resp ibv_resp;
> __u64 status_page_key;
> __u32 status_page_size;
> };

Or add __attribute__((aligned(4))) to the 64bit fields.
And maybe a compile time assert on the length of the structure.
Since it is part of an ABI it must not change

David



RE: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-04 Thread David Laight
> > It seems to me that a more useful interface would take a minimum and
> > maximum number of vectors from the driver.  This wouldn't allow the
> > driver to specify that it could only accept, say, any even number within
> > a certain range, but you could still leave the current functions
> > available for any driver that needs that.
> 
> Mmmm.. I am not sure I am getting it. Could you please rephrase?

One possibility is for drivers than can use a lot of interrupts to
request a minimum number initially and then request the additional
ones much later on.
That would make it less likely that none will be available for
devices/drivers that need them but are initialised later.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/2] fs: supply inode uid/gid setting interface

2013-08-27 Thread David Laight
> Subject: Re: [PATCH 0/2] fs: supply inode uid/gid setting interface
> 
> On 2013/8/23 12:10, Greg KH wrote:
> > On Fri, Aug 23, 2013 at 10:48:36AM +0800, Rui Xiang wrote:
> >> This patchset implements an accessor functions to set uid/gid
> >> in inode struct. Just finish code clean up.
> >
> > Why?
> >
> It can introduce a new function to reduce some codes.
>  Just clean up.

In what sense is it a 'cleanup' ?

To my mind it just means that anyone reading the code has
to go and look at another file in order to see what the
function does (it might be expected to be more complex).

It also adds run time cost, while probably not directly
measurable I suspect it more than doubles the execution
time of that code fragment - do that everywhere and the
system will run like a sick pig.

The only real use for accessor functions is when you
don't want the structure offset to be public.
In this case that is obviously not the case since
all the drivers are directly accessing other members
of the structure.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 net-next 05/22] cxgb4: Add T5 write combining support

2013-03-14 Thread David Laight
> This patch implements a low latency Write Combining (aka Write Coalescing) 
> work
> request path. PCIE maps User Space Doorbell BAR2 region writes to the new
> interface to SGE. SGE pulls a new message from PCIE new interface and if its a
> coalesced write work request then pushes it for processing. This patch copies
> coalesced work request to memory mapped BAR2 space.
...
> + } else {
> + memset(data, 0, 2 * sizeof(u64));
> + *data += 2;
> + }

Using memset is overkill (or rather a big overhead if it isn't
detected by the compiler). Nothing wrong with:
(*data)[0] = 0;
(*data)[1] = 0;
*data += 2;
Actually, typing that, I realise that you should probably have
read *data into a local variable, and then updated it when finished.
Otherwise some of the accesses might be ones which force the
compiler to reload the value.

>  static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
>  {
> + unsigned int *wr, index;
> +
>   wmb();/* write descriptors before telling HW */
>   spin_lock(&q->db_lock);
>   if (!q->db_disabled) {
> - t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
> -  QID(q->cntxt_id) | PIDX(n));
> + if (is_t4(adap->chip)) {
> + t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
> +  QID(q->cntxt_id) | PIDX(n));
> + } else {
> + if (n == 1) {
> + index = q->pidx ? (q->pidx - 1) : (q->size - 1);
> + wr = (unsigned int *)&q->desc[index];
> + cxgb_pio_copy((u64 __iomem *)
> +   (adap->bar2 + q->udb + 64),
> +   (u64 *)wr);

Why all the casts on 'wr' ?

> + } else
> + writel(n,  adap->bar2 + q->udb + 8);
> + wmb();

Since you actually need memory barriers here on x86 you definitely
need a comment saying so, and it would (IMHO) better to use a
different define in the source (even if it is currently converted
to wmb() in a local header file).

Thinking further, for portability you might want to find some way
of abstracting the multi-word writes somehow.
For example, some of the ppc have a dma engine associated with the
PCIe master interface that can be used to generate large TLP.
The code would still want to spin waiting for the dma to complete
(since the transfer would be far faster than any interrupt path).

David



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next 05/22] cxgb4: Add T5 write combining support

2013-03-13 Thread David Laight
> >>> + writel(n,  adap->bar2 + q->udb + 8);
> >>> +#if defined(CONFIG_X86_32) || defined(CONFIG_X86_64)
> >>> + asm volatile("sfence" : : : "memory");
> >>> +#endif
> >> There is absolutely no way I'm letting anyone put crap like this
> >> into a driver.
> >>
> >> Use a portable inteface, and if one does not exist create one.
> >
> > I guess you'll have to add a wc_wmb() function for all the hw platforms
> > supported by the kernel.  I see libibverbs defines this for the user
> > side in include/infiniband/arch.h, and that could be used as the meat of
> > the hw platform-specific implementations.
> >
> I see that normal wmb() code for x86_64 architecture is same as what
> above #ifdef condition is doing. To be more clear I looked at the
> assembly code for wmb and find that it is converted into 'sfence'
> instruction. So, I think above code should be replaced with wmb call
> which will also take care of portability on different architecture. I
> will submit the series again soon.

>From my recollection of the x86 architecture, the memory barriers
are hardly ever needed, certainly not in the places where, for example
a ppc needs them. I'd actually suspect that the normal wmb() for
x86 should be a nop.

About the only place where any on the fence instructions are needed
are in relation to write combining accesses.
In particular I don't believe they are ever needed to synchronise
uncached accesses with each other, or with cached accesses (which
are snooped).

David




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/9] ocrdma: Driver for Emulex OneConnect RDMA adapter

2012-03-23 Thread David Laight
 
> > struct {
> >   u64 fielda
> > u32 field;
> > };
> 
> In this case: On 64 bit: the u64 is aligned to 8 and the u32 
> is aligned to 4. So
> the structure is aligned to 8. A pad is inserted at the end of the
> struct to bring it out. On 32 bit, the u64 is aligned to 4, so the
> struct is aligned to 4, so no pad is added.

That is true for 32bit i386, some 32bit ABI may require 64bit
alignment for 64bit items.

...
> The aligned attribute overrides the automatic determination of the
> alignment based on the contents and just forces it.
> 
> So, as an example, if you have this hardware layout:
> 
> struct {
>   u32 fielda;
>   u64 fieldb;
> } attribute ((aligned(4));
> 
> David is saying you will get a 12 byte struct and fieldb will be
> unaligned. Since 12 is aligned to 4 no padding is added.

I was actually suggesting defining (modulo syntax errors):
typedef u64 u64_aligned_8 attribute ((aligned(8));

Then you can define:
struct foo {
u32 fielda;
u64_aligned_8 fieldb;
};
and know that the aligment should be the same on all systems.
(But I'd still add the explicit pad in the above.)

This is actually more useful when you need to process structures
that have misaligned 64bit items - as happens when doing 'compat 32'
code in a 64bit kernel. Or trying to map the x86 disk mbr.

> For hardware facing structures I'd combine this with a static assert
> to verify structure size at compile time.

Also useful if you need to ensure a structure doesn't accidentally
change size when you need to preserve a userspace interface.

> So..
> 
> 1) Avoid using attributes unless the structure has unaligned members.
> 2) Avoid creating structures with unaligned members (eg for userspace
>communication)
> 3) Frown at hardware/firmware developers who make communication
>structures with unaligned members :)
> 4) Be explicit about padding in your layout for 64/32
>compatibility.

Agreed.

David
 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/9] ocrdma: Driver for Emulex OneConnect RDMA adapter

2012-03-21 Thread David Laight
 
> > +#define is_cqe_wr_imm(cqe) \
> > +       ((le32_to_cpu(cqe->flags_status_srcqpn) & OCRDMA_CQE_WRITE_IMM) ? 1 
> > : 0)
> 
> ...similar comment about using readable typesafe inline functions
> instead of macros...

and if you are using #defines, you need to enclose every reference
to the parameters in ().

David


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/9] ocrdma: Driver for Emulex OneConnect RDMA adapter

2012-03-21 Thread David Laight
 
> > - Header file for userspace library and kernel driver interface.
> 
> > +struct ocrdma_alloc_ucontext_resp {
> > +       u32 dev_id;
> > +       u32 wqe_size;
> > +       u32 max_inline_data;
> > +       u32 dpp_wqe_size;
> > +       u64 ah_tbl_page;
> > +       u32 ah_tbl_len;
> > +       u32 rsvd;
> > +       u8 fw_ver[32];
> > +       u32 rqe_size;
> > +       u64 rsvd1;
> > +} __packed;
> 
> If I'm reading this correctly, you have the 8-byte rsvd1 member at an
> offset only aligned to 4 bytes, because of the __packed directive.  It
> would be much better to have these structures laid out so they are
> naturally the same on both 32-bit and 64-bit ABIs, and get rid of the
> __packed directive, which wrecks gcc code generation in some cases.
> 

gcc also supports defining types that have non-standard alignment
constraints that can be used to force the same alignment for
64bit fields between i386 and amd64.
Probably __attribute__((aligned,n)) or similar.

This can be used to force 32bit alignment in amd64 code in order
to match definitions in 32bit userspace.
For new things it would make sense to force 64bit alignment
of 64bit fields for 32bit code.

Adding __packed (rather than 32bit alignment) forces the compiler
to generate byte by byte accesses for all the fields on systems
that can't do misaligned accesses in hardware (eg sparc).

David


David


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html