Fwd: [ewg] Re: [PATCH v3] mlx4_ib: Optimize hugetlab pages support

2009-03-31 Thread Olga Shern (Voltaire)
Vlad,

Can you please replase mlx4_1070-optimize-huge_tlb.patch
with Yossi's patch. It fixes Eli's patch.

Thanks
Olga


-- Forwarded message --
From: Yossi Etigin 
Date: Mon, Mar 30, 2009 at 6:49 PM
Subject: [ewg] Re: [PATCH v3] mlx4_ib: Optimize hugetlab pages support
To: Eli Cohen 
Cc: Roland Dreier , ewg
, general-list



Eli Cohen wrote:
>
> Since Linux may not merge adjacent pages into a single scatter entry through
> calls to dma_map_sg(), we check the special case of hugetlb pages which are
> likely to be mapped to coniguous dma addresses and if they are, take advantage
> of this. This will result in a significantly lower number of MTT segments used
> for registering hugetlb memory regions.
>

How about the one below - it fixes bugzilla #1569 (fix mapping for
size that is not
on page boundary):

---

Since Linux may not merge adjacent pages into a single scatter entry through
calls to dma_map_sg(), we check the special case of hugetlb pages which are
likely to be mapped to coniguous dma addresses and if they are, take advantage
of this. This will result in a significantly lower number of MTT segments used
for registering hugetlb memory regions.

Signed-off-by: Eli Cohen 
---
drivers/infiniband/hw/mlx4/mr.c |   81 ++
1 files changed, 72 insertions(+), 9 deletions(-)

Index: b/drivers/infiniband/hw/mlx4/mr.c
===
--- a/drivers/infiniband/hw/mlx4/mr.c   2008-11-19 21:32:15.0 +0200
+++ b/drivers/infiniband/hw/mlx4/mr.c   2009-03-30 18:29:55.0 +0300
@@ -119,6 +119,70 @@ out:
       return err;
}

+static int handle_hugetlb_user_mr(struct ib_pd *pd, struct mlx4_ib_mr *mr,
+                                 u64 start, u64 virt_addr, int access_flags)
+{
+#if defined(CONFIG_HUGETLB_PAGE) && !defined(__powerpc__) && !defined(__ia64__)
+       struct mlx4_ib_dev *dev = to_mdev(pd->device);
+       struct ib_umem_chunk *chunk;
+       unsigned dsize;
+       dma_addr_t daddr;
+       unsigned cur_size = 0;
+       dma_addr_t uninitialized_var(cur_addr);
+       int n;
+       struct ib_umem  *umem = mr->umem;
+       u64 *arr;
+       int err = 0;
+       int i;
+       int j = 0;
+       int off = start & (HPAGE_SIZE - 1);
+
+       n = DIV_ROUND_UP(off + umem->length, HPAGE_SIZE);
+       arr = kmalloc(n * sizeof *arr, GFP_KERNEL);
+       if (!arr)
+               return -ENOMEM;
+
+       list_for_each_entry(chunk, &umem->chunk_list, list)
+               for (i = 0; i < chunk->nmap; ++i) {
+                       daddr = sg_dma_address(&chunk->page_list[i]);
+                       dsize = sg_dma_len(&chunk->page_list[i]);
+                       if (!cur_size) {
+                               cur_addr = daddr;
+                               cur_size = dsize;
+                       } else if (cur_addr + cur_size != daddr) {
+                               err = -EINVAL;
+                               goto out;
+                       } else
+                               cur_size += dsize;
+
+                       if (cur_size > HPAGE_SIZE) {
+                               err = -EINVAL;
+                               goto out;
+                       } else if (cur_size == HPAGE_SIZE) {
+                               cur_size = 0;
+                               arr[j++] = cur_addr;
+                       }
+               }
+
+       if (cur_size) {
+               arr[j++] = cur_addr;
+       }
+
+       err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, umem->length,
+                           convert_access(access_flags), n,
HPAGE_SHIFT, &mr->mmr);
+       if (err)
+               goto out;
+
+       err = mlx4_write_mtt(dev->dev, &mr->mmr.mtt, 0, n, arr);
+
+out:
+       kfree(arr);
+       return err;
+#else
+       return -ENOSYS;
+#endif
+}
+
struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
                                 u64 virt_addr, int access_flags,
                                 struct ib_udata *udata)
@@ -140,17 +204,20 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct
               goto err_free;
       }

-       n = ib_umem_page_count(mr->umem);
-       shift = ilog2(mr->umem->page_size);
-
-       err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length,
-                           convert_access(access_flags), n, shift, &mr->mmr);
-       if (err)
-               goto err_umem;
-
-       err = mlx4_ib_umem_write_mtt(dev, &mr->mmr.mtt, mr->umem);
-       if (err)
-               goto err_mr;
+       if (!mr->umem->hugetlb ||
+           handle_hugetlb_user_mr(pd, mr, start, virt_addr, access_flags)) {
+               n = ib_umem_page_count(mr->umem);
+               shift = ilog2(mr->umem->page_size);
+
+               err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn,
virt_addr, length,
+                                   convert_access(access_flags), n,
shift, &mr->mmr);
+               if (err)
+             

***SPAM*** Re: [ewg] Re: [Bug 1546] Compilation on SLES11 (x86_64 archi') failed

2009-03-17 Thread Olga Shern (Voltaire)
On Tue, Mar 17, 2009 at 6:25 PM, Jeff Becker  wrote:
> Hi. Yossi Etigin provided an installer patch that Vlad applied
> yesterday. Does the ia64 problem happen with this patch? Thanks.
>
> -jeff

Hi Jeff,

The patch should fix also the compilation issue on ia64.
The patch wasn't applied in today's daily, we are going to test SLES
11 compilation on all arch tomorrow.

Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] RE: Delaying next Monday OFED meeting

2009-03-05 Thread Olga Shern (Voltaire)
Both dates are OK with us

On Thu, Mar 5, 2009 at 4:02 PM, John Russo  wrote:
> Let’s go for the 12th.
>
>
>
> From: ewg-boun...@lists.openfabrics.org
> [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Tziporet Koren
> Sent: Thursday, March 05, 2009 7:23 AM
> To: ewg@lists.openfabrics.org
> Subject: [ewg] Delaying next Monday OFED meeting
>
>
>
> Hello,
>
> Due to Purim holiday in Israel I wish to delay the next Monday OFED meeting.
>
> We can do it next week on Thursday (12 March) 9am PST or delay to a week
> after on Monday (March 16 ) 9am PST
>
> Can you reply with your availability?
>
> Sorry for this inconvenient.
>
> Tziporet
>
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] OFED (EWG) meeting agenda for tomorrow (Jan 26)

2009-01-25 Thread Olga Shern (Voltaire)
>
> 3. OFED 1.5 schedule
>
> Betsy from Qlogic suggested to early the release.
>
> From the other hand Olga from Voltaire asked to stay with the July time
> frame.
>
> Based on the decisions in 1 & 2 we should decide on the release schedule.

We should decide whether we want to have one or two OFED releases per year.
If we will decide that we should go for one OFED release per year, I
think we should postpone OFED 1.5 release to October.
And have dot release in a middle.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] OFED Jan 5, 2009 meeting minutes on OFED plans

2009-01-08 Thread Olga Shern (Voltaire)
>- Kernel base will be 2.6.29

Hi,

Kernel 2.6.29 window will be closed very soon, so it means that we
cannot have any new features in this kernel.
Therefore no new features in OFED 1.5.
I think we should be based on 2.6.30.
And I agree with Tziporet regarding the OFED 1.5 schedule, no need to
rush, OFED is mature enough, therefore no need to have releases every
1/2 year.

Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [PATCH] libsdp: Add epoll support

2008-12-09 Thread Olga Shern
Amir,

Thanks, this is OK.

Olga

-Original Message-
From: Amir Vadai [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 09, 2008 1:28 PM
To: Yossi Etigin
Cc: OF-EWG; Olga Shern
Subject: Re: [PATCH] libsdp: Add epoll support

Hi,


It is too late for 1.4 release.


will be in the next release.


- Amir


Amir Vadai wrote:

> Thanks,
>
>
> It looks fine - I will try to make it for the release.
>
>
> - Amir.
>
>
> Yossi Etigin wrote:
>
>   
>> Add epoll_create/epoll_ctl/epoll_wait/epoll_pwait support to libsdp.
>> When creating epoll set, make it twice as large, for shadow fd data.
>> When doing epoll_ctl, perform the same operation on shadow fd as
well.
>> (If the shadow fd is closed, it will be automatically removed from
epoll
>> set).
>> When waiting (epoll_wait or epoll_pwait), no need to perform special
>> work, since it returns user data and not file descriptors.
>>
>> Signed-off-by: Yossi Etigin <[EMAIL PROTECTED]>
>>
>> -- 
>>
>> diff --git a/src/port.c b/src/port.c
>> index 340c2a5..9cf73a5 100644
>> --- a/src/port.c
>> +++ b/src/port.c
>> @@ -49,6 +49,7 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>> /*
>>  * SDP specific includes
>>  */
>> @@ -175,6 +176,33 @@ typedef int (
>> unsigned long int nfds, int timeout);
>>
>> +typedef int (
>> +*epoll_create_func_t ) (
>> +int size);
>> +
>> +typedef int (
>> +*epoll_ctl_func_t ) (
>> +int epfd,
>> +int op,
>> +int fd,
>> +struct epoll_event *event);
>> +
>> +typedef int (
>> +*epoll_wait_func_t ) (
>> +int epfd,
>> +struct epoll_event *events,
>> +int maxevents,
>> +int timeout);
>> +
>> +typedef int (
>> +*epoll_pwait_func_t ) (
>> +int epfd,
>> +struct epoll_event *events,
>> +int maxevents,
>> +int timeout,
>> +const sigset_t *sigmask);
>> +
>> +
>> struct socket_lib_funcs
>> {
>> ioctl_func_t ioctl;
>> @@ -193,6 +221,10 @@ struct socket_lib_funcs
>> select_func_t select;
>> pselect_func_t pselect;
>> poll_func_t poll;
>> +epoll_create_func_t epoll_create;
>> +epoll_ctl_func_t epoll_ctl;
>> +epoll_wait_func_t epoll_wait;
>> +epoll_pwait_func_t epoll_pwait;
>> };  /* socket_lib_funcs */
>>
>> static int simple_sdp_library;
>> @@ -2506,6 +2538,137 @@ poll(
>> }  /* poll */
>>
>> /*
>>

=
>> */
>> +/*..epoll_create -- replacement socket
>> call. */
>> +/*
>> +   Need to make the size twice as large for shadow fds
>> +*/
>> +int
>> +epoll_create(
>> +int size )
>> +{
>> +int epfd;
>> +
>> +if (init_status == 0) __sdp_init();
>> +
>> +if ( NULL == _socket_funcs.epoll_create ) {
>> +__sdp_log( 9, "Error epoll_create: no implementation for
>> epoll_create found\n" );
>> +return -1;
>> +}
>> +
>> +__sdp_log( 2, "EPOLL_CREATE: <%s:%d>\n",
>> program_invocation_short_name, size );
>> +
>> +epfd = _socket_funcs.epoll_create( size * 2 );
>> +
>> +__sdp_log( 2, "EPOLL_CREATE: <%s:%d> return %d\n",
>> +  program_invocation_short_name, size, epfd );
>> +return epfd;
>> +}  /* epoll_create */
>> +
>> +/*
>>

=
>> */
>> +/*..epoll_ctl -- replacement socket
>> call.   */
>> +/*
>> +   Need to add/delete/modify shadow fds as well
>> +*/
>> +int
>> +epoll_ctl(
>> +int epfd,
>> +int op,
>> +int fd,
>> +struct epoll_event *event )
>> +{
>> +int ret, shadow_fd, ret2;
>> +
>> +if (init_status == 0) __sdp_init();
>> +
>> +if ( NULL == _socket_funcs.epoll_ctl ) {
>> +__sdp_log( 9, "Error epoll_ctl: no implementation for
>> epoll_ctl found\n" );
>> +return -1;
>> +}
>> +
>> +__sdp_log( 2, "EPOLL_CTL: <%s:%d> op <%d:%d>\n",
>> +  program_invocation_short_name, epfd, op, fd );
>> +
>> +ret = _socket_

***SPAM*** Re: [ewg] OFED Nov 24, 2008 meeting minutes

2008-11-27 Thread Olga Shern (Voltaire)
>
> OFED 1.4 release: RC6 on Nov 28, GA on Dec 8

Hi,

Are you going to build RC6 today/tomorrow?
I see that there are still a lot of major bugs. Maybe we should wait?

Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] RE: Do we have an EWG meeting today?

2008-11-24 Thread Olga Shern (Voltaire)
I got email from Jeff:

Friendly reminder: the OFED teleconference is today (24 November 2008).

1. Noon US Eastern / 9am US Pacific / 7pm Israel
  Monday, November 24, code 210020028 (*** TODAY ***)
2. Noon US Eastern / 9am US Pacific / 7pm Israel
  Monday, December 1, code 210020028

US/Canada:  +1.866.432.9903
India:  +91.80.4103.3979
Israel: +972.9.892.7026
Others: http://cisco.com/en/US/about/doing_business/conferencing/



On Mon, Nov 24, 2008 at 6:57 PM, Woodruff, Robert J
<[EMAIL PROTECTED]> wrote:
> I can set up a bridge number if we want to meet.
>
> woody
>
>
> -Original Message-
> From: Tziporet Koren [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 24, 2008 5:10 AM
> To: Woodruff, Robert J; Betsy Zeller; Olga Shern
> Subject: Do we have an EWG meeting today?
>
> I thought we decided to have one but I don't see such meeting in my calender
>
> Tziporet
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] OFED 1.4 - delay the GA to Dec 4

2008-11-20 Thread Olga Shern (Voltaire)
>
> 1370blo [EMAIL PROTECTED] Ping over IPoIB I/F fails
> after ifconfig down and up
>

Yossi have sent a patch that fixes this

> 1198cri [EMAIL PROTECTED] hang during ipoib
> create_child/ifdown

We sent patch to Roland some time ago. But it was decided in EWG meeting that
because:
 1. It is rarely that user will run such test
2. This is an old bug that wasn't introduced in OFED 1.4
we will not add the patch to OFED 1.4

If you think this is another bug we should open a new one


> 1289maj [EMAIL PROTECTED]Ib and ipoib doesnt respond while
> running multiple tests ...
>

It seems that this was already fixed - need only retest this and
verify that this is indeed fixed
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] OFED 1.4 bugs status and OFED meetings

2008-11-17 Thread Olga Shern (Voltaire)
Hi Vlad,

Is this bug :1349maj [EMAIL PROTECTED]Kernel panic on sdp
was fixed?

Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] rhel4.6 testing

2008-11-07 Thread Olga Shern (Voltaire)
I assume you mean OFED 1.4
We have tested it - regression tests.
Do you see any problem?

On Thu, Nov 6, 2008 at 9:12 PM, Steve Wise <[EMAIL PROTECTED]> wrote:
> Has anyone tested the core rdma stuff on rhel4.6?
> Thanks,
>
> Steve.
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] OFED Nov 3 2008 meeting summary on OFED 1.4 status

2008-11-05 Thread Olga Shern
Hi,

We have fixed bug #1335, Moni released new bonding version yesterday.

Regarding bug #1301, we can see that on different machines (with the
same OS and ARCH) it behaves differently, we will have more info today.

Regarding bug #1336, we think that we know what cause to the bug, but we
don't have solution. This bug is important, because this is regression,
it worked fine before RC3. If someone can help with this, it will be
great.

I also think that it is better to postpone RC4 to November 10 and have
GA one week after it, if we will not find any critical bugs.

Olga

 
-Original Message-
From: Betsy Zeller [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 06, 2008 5:25 AM
To: Vladimir Sokolovsky
Cc: Olga Shern; Stefan Roscher; [EMAIL PROTECTED]; OF-EWG;
Tziporet Koren
Subject: Re: [ewg] OFED Nov 3 2008 meeting summary on OFED 1.4 status

Vlad - Yannick Cote will check in a fix tonight  for 1326, plus some
other driver fixes. Our internal testing indicates that this will also
fix 1283.

We do not currently have a fix for 1242, but are continuing to work on
it. Given that IBMs fix isn't scheduled to come in till tomorrow, and
that I haven't yet seen a response from the folks at Voltaire, I'd
recommend holding off RC4 till Monday, Nov 10.

Thanks, Betsy


On Wed, 2008-11-05 at 21:29 +0200, Vladimir Sokolovsky wrote:
> Meeting Summary:
> > ==
> > RC4 is delayed - will be released on Thursday Nov 6.
> >
> > Details:
> > ===
> > Bugs to be fixed in RC4:
> >
> > 1283blocker   P1RHEL 5 [EMAIL PROTECTED]   NEW
> > Intel MPI fails on Qlogc HCA
> > 1326blocker   P1RHEL 4 [EMAIL PROTECTED]   NEW
> > ipath driver fails to build on IA64 in the 10/28/08 daily build
> > 1335major P3Other  [EMAIL PROTECTED]NEW
> > Bonding: packet lost during failover
> > 1301major P3RHEL 4 [EMAIL PROTECTED]NEW
> > Can not load rds module on RH4 up7
> > 1323blocker   P1All[EMAIL PROTECTED] REOPENED   
> > IB/ehca: possibillity of kernel panic under certain circumstances
> > 1242critical  P2RHEL 4 [EMAIL PROTECTED]   NEW
> > kernel panic while running mpi2007 against ofed1.4 -- ib_ipath: 
> > ipath_sdma_verbs_send
> > 1336critical  P1RHEL 5 [EMAIL PROTECTED]   NEW
> > Can't to unloading the mlx4_ib module on ppc64
> >
> 
> 
> Hi all,
> I see that the number of critical issues did not decreased.
> Do you think we should delay RC4 till Monday Nov 10 or you expect
these 
> issues will be fixed by tomorrow?
> 
> 
> Regards,
> Vladimir

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] OFED October 27 2008 meeting summary on OFED 1.4 status

2008-10-28 Thread Olga Shern (Voltaire)
> 2. We had a discussion on NFS-RDMA since both RHEL 5.1 and SLES10 SP2
> backports are not working well
> We had a debate - do we take it out of OFED since it is not working on
> the distros
> Leave it in: We can have bug fixes for 1.4.1, and give customers a
> platform to play with
> Take it out: If someone will try it on the distro experience can be
> problematic
> Decision: We will leave it for 2.6.27 kernel only.
> All testing should be done on this kernel mainly to see that basic
> functionality is working

We have tested NFSoRDMA on 2.6.27 and didn't see any of the issues
that we see on Distros.
So basic functionality is working
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] ***SPAM*** NFS-RDMA compilation problem

2008-10-25 Thread Olga Shern (Voltaire)
Hi Amar,

I suggest you to open bug in openbabrics bugzilla:
https://bugs.openfabrics.org/.

Thanks
Olga

On Thu, Oct 23, 2008 at 4:50 PM, Amar Mudrankit
<[EMAIL PROTECTED]> wrote:
> While I was trying to install OFED-1.4-rc3 over SLES 10 SP 2 with
> NFS-RDMA selected for installation, I got the following error message:
>
> nfs-utils-1.1.1 rpm is required to build kernel-ib
>
> I have downloaded and installed successfully, the nfs-utils-1.1.4
> **source .tgz** from   http://www.kernel.org/pub/linux/utils/nfs,
> still I was hit with the same error message.
>
> I was not able to find out nfs-utils rpm that would install over SLES
> 10 SP 2.  Can anybody please point me to the location of rpm? Why is
> OFED installation unable to detect the latest installation of nfs
> utils compiled from source and is fully dependent upon the rpm
> installation?
>
> Regards,
> Amar
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] ***SPAM*** Re: [ofa-general] OFED-1.4-rc3 is available

2008-10-23 Thread Olga Shern (Voltaire)
> - 27 bugs fixed (see attached for details)

Hi Vlad,

I don't see the attached file.

Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled"

2008-10-02 Thread Olga Shern (Voltaire)
We run regression tests and it were OK.
We will continue the testing and update if we see any issues.

Olga

On Sun, Sep 28, 2008 at 2:40 PM, Olga Shern (Voltaire)
<[EMAIL PROTECTED]> wrote:
> Hi Eli,
>
> We also want to run regression tests with this patch.
> Please let me know when OFED daily build will include it.
>
> Thanks
> Olga
>
> On Sun, Sep 28, 2008 at 2:39 PM, Eli Cohen <[EMAIL PROTECTED]> wrote:
>> On Fri, Sep 26, 2008 at 01:19:00PM -0700, Roland Dreier wrote:
>>> How about this?  Instead of trying to rely on some complicated and
>>> fragile reasoning about when some race might occur, let's just do what
>>> we want to do anyway and get rid of LLTX.  We change from priv->tx_lock
>>> (taken with IRQ disabling) to netif_tx_lock (taken on with
>>> BH-disabling).  And then we can keep the skb_orphan in the place it is,
>>> since our xmit routine runs with IRQs enabled.
>>>
>>
>> We'll integrate this into ofed 1.4 and monitor this through our
>> regression system.
>> ___
>> ewg mailing list
>> ewg@lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED build hangs when trying to build sdpnetstat

2008-10-01 Thread Olga Shern (Voltaire)
Hi,

Please see bugzilla:
https://bugs.openfabrics.org/show_bug.cgi?id=1238

Olga


On Mon, Sep 29, 2008 at 10:17 PM, Woodruff, Robert J
<[EMAIL PROTECTED]> wrote:
>
> Has anyone else seen a problem with the OFED install in
> today's daily build hanging while trying to build sdpnetstat ?
>
> Here is the last few lines in the log file after a did a
> cntrl-c.
> The hang seems to happen both on EL 5.2 (2.6.18-92.el5) and EL 5.1
> (2.6.18-53.el5).
>
> + unset DISPLAY
> + make netstat
> Configuring the Linux net-tools (NET-3 Base Utilities)...
>
> *
> *
> *  Internationalization
> *
> * The net-tools package has currently been translated to French,
> * German and Brazilian Portugese.  Other translations are, of
> * course, welcome.  Answer `n' here if you have no support for
> * internationalization on your system.
> *
> Does your system support GNU gettext? (I18N) [n] *
> *
> * Protocol Families.
> *
> UNIX protocol family (HAVE_AFUNIX) [y] INET (TCP/IP) protocol family
> (HAVE_AFINET) [y] INET6 (IPv6) protocol family (HAVE_AFINET6) [n] make:
> *** Deleting file `config.h'
> make: *** wait: No child processes.  Stop.
> make: *** Waiting for unfinished jobs
> make: *** wait: No child processes.  Stop.
>
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: Continue of "defer skb_orphan() until irqs enabled"

2008-09-28 Thread Olga Shern (Voltaire)
Hi Eli,

We also want to run regression tests with this patch.
Please let me know when OFED daily build will include it.

Thanks
Olga

On Sun, Sep 28, 2008 at 2:39 PM, Eli Cohen <[EMAIL PROTECTED]> wrote:
> On Fri, Sep 26, 2008 at 01:19:00PM -0700, Roland Dreier wrote:
>> How about this?  Instead of trying to rely on some complicated and
>> fragile reasoning about when some race might occur, let's just do what
>> we want to do anyway and get rid of LLTX.  We change from priv->tx_lock
>> (taken with IRQ disabling) to netif_tx_lock (taken on with
>> BH-disabling).  And then we can keep the skb_orphan in the place it is,
>> since our xmit routine runs with IRQs enabled.
>>
>
> We'll integrate this into ofed 1.4 and monitor this through our
> regression system.
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] ***SPAM*** Regarding: bonding issue.

2008-09-24 Thread Olga Shern (Voltaire)
Hi Gnana,

First, I would recommend using OFED 1.3.1.
How did you configure bonding?
Please check whether you configuration is according the instructions
in /usr/share/doc/packages/ib-bonding-0.9.0/ib-bonding.txt

Best Regards,
Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] ***SPAM*** Fwd: [ofa-general] ***SPAM*** [PATCH] ipoib: fix deadlock between join completion handler and ipoib_stop

2008-09-21 Thread Olga Shern (Voltaire)
Hi Vlad,

Please add this patch to OFED 1.4

Thanks
Olga


-- Forwarded message --
From: Yossi Etigin <[EMAIL PROTECTED]>
Date: Mon, Sep 15, 2008 at 11:45 PM
Subject: [ofa-general] ***SPAM*** [PATCH] ipoib: fix deadlock between
join completion handler and ipoib_stop
To: Roland Dreier <[EMAIL PROTECTED]>
Cc: general list <[EMAIL PROTECTED]>


Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop(). We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly.

The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes. This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(),
which was already taken by ipoib_stop().
Signed-off-by: Yossi Etigin <[EMAIL PROTECTED]>

--

Index: b/drivers/infiniband/ulp/ipoib/ipoib.h
===
--- a/drivers/infiniband/ulp/ipoib/ipoib.h  2008-08-27
21:03:44.0 +0300
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h  2008-09-15
23:08:30.0 +0300
@@ -293,6 +293,7 @@ struct ipoib_dev_priv {

   struct delayed_work pkey_poll_task;
   struct delayed_work mcast_task;
+   struct work_struct broadcast_join_task;
   struct work_struct flush_light;
   struct work_struct flush_normal;
   struct work_struct flush_heavy;
@@ -464,6 +465,7 @@ int ipoib_dev_init(struct net_device *de
void ipoib_dev_cleanup(struct net_device *dev);

void ipoib_mcast_join_task(struct work_struct *work);
+void ipoib_mcast_broadcast_join_task(struct work_struct *work);
void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb);

void ipoib_mcast_restart_task(struct work_struct *work);
Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-09-08
20:14:08.0 +0300
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-09-15
23:07:45.0 +0300
@@ -1075,6 +1075,7 @@ static void ipoib_setup(struct net_devic

   INIT_DELAYED_WORK(&priv->pkey_poll_task, ipoib_pkey_poll);
   INIT_DELAYED_WORK(&priv->mcast_task,   ipoib_mcast_join_task);
+   INIT_WORK(&priv->broadcast_join_task, ipoib_mcast_broadcast_join_task);
   INIT_WORK(&priv->flush_light,   ipoib_ib_dev_flush_light);
   INIT_WORK(&priv->flush_normal,   ipoib_ib_dev_flush_normal);
   INIT_WORK(&priv->flush_heavy,   ipoib_ib_dev_flush_heavy);
Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c2008-09-15
23:02:42.0 +0300
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c2008-09-15
23:37:41.0 +0300
@@ -389,6 +389,21 @@ static int ipoib_mcast_sendonly_join(str
   return ret;
}

+void ipoib_mcast_broadcast_join_task(struct work_struct *work)
+{
+   struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
+  broadcast_join_task);
+
+   /*
+* Take rtnl_lock to avoid racing with ipoib_stop()
+* and turning the carrier back on while a device
+* is being removed.
+*/
+   rtnl_lock();
+   netif_carrier_on(priv->dev);
+   rtnl_unlock();
+}
+
static int ipoib_mcast_join_complete(int status,
struct ib_sa_multicast *multicast)
{
@@ -415,16 +430,9 @@ static int ipoib_mcast_join_complete(int
  &priv->mcast_task, 0);
   mutex_unlock(&mcast_mutex);

-   if (mcast == priv->broadcast) {
-   /*
-* Take RTNL lock here to avoid racing with
-* ipoib_stop() and turning the carrier back
-* on while a device is being removed.
-*/
-   rtnl_lock();
-   netif_carrier_on(dev);
-   rtnl_unlock();
-   }
+   /* Would deadlock with ipoib_stop if rtnl_lock was taken */
+   if (mcast == priv->broadcast)
+   queue_work(ipoib_workqueue, &priv->broadcast_join_task);

   return 0;
   }
___
general mailing list
[EMAIL PROTECTED]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] FW: [PATCH] ipoib: fix hang while bringing down uninitialized interface

2008-09-07 Thread Olga Shern
Hi Vlad,

Please add this patch to OFED 1.4 

Thanks
Olga

-Original Message-
From: Yossi Etigin [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 05, 2008 6:01 PM
To: Roland Dreier
Cc: general list; Olga Shern
Subject: [PATCH] ipoib: fix hang while bringing down uninitialized
interface

 Fix bug #1172: If a pkey for an interface is not found during
initialization, then poll_timer is left uninitialized. When the
device is brought down, ipoib tries to del_timer_sync() it. This
call hangs in an infinite loop in lock_timer_base(), because
timer_base is NULL. We should check whether the timer was really
initialized.


Signed-off-by: Yossi Etigin <[EMAIL PROTECTED]>

--

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 66cafa2..3bbf46d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -850,7 +850,10 @@ int ipoib_ib_dev_stop(struct net_device *dev, int
flush)
ipoib_dbg(priv, "All sends and receives done.\n");
 
 timeout:
-   del_timer_sync(&priv->poll_timer);
+   /* Make sure the timer is initialized */
+   if (priv->poll_timer.function)
+   del_timer_sync(&priv->poll_timer);
+
qp_attr.qp_state = IB_QPS_RESET;
if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
ipoib_warn(priv, "Failed to modify QP to RESET
state\n");


--Yossi
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [ewg] OFED installation on RH5 UP2

2008-08-28 Thread Olga Shern (Voltaire)
Hi Vlad,

I found another issue with openmpi rpm removal on RH5 UP2.
See patch attached

Thanks
Olga


uninstall.patch
Description: Binary data
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

***SPAM*** Re: [ewg] ***SPAM*** OFED installation on SLES 10

2008-08-28 Thread Olga Shern (Voltaire)
Thanks Vlad,

It works :)
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] ***SPAM*** OFED installation on SLES 10

2008-08-28 Thread Olga Shern (Voltaire)
Hi Vlad,

I tested OFED 1.4 beta installation on SLES 10 minimal installation,
and all dependencies checks were OK except kernel sources check,
I think we should add for sles check whether kernel-source rpm is installed
I attached a patch that should fix it.

Thanks
Olga


install.diff
Description: Binary data
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

RE: [ewg] OFED installation on RH5 UP2

2008-08-27 Thread Olga Shern
Hi Vlad,

I test this with OFED 1.4 build and it works.
I saw that you forgot to add test whether rpm-build package is
installed.
Can you please add it.

--- install.pl.orig 2008-08-17 18:58:27.0 +0300
+++ install.pl  2008-08-18 16:34:35.0 +0300
@@ -2431,6 +2431,14 @@ sub check_linux_dependencies
 if (! $check_linux_deps) {
 return 0;
 }
+
+if ($distro eq "redhat" or $distro eq "fedora" or $distro eq
'redhat5')
+{
+   if (not is_installed("rpm-build")) {
+   print RED "rpm-build is required to build OFED", RESET
"\n";
+   $err++;
+   }
+}
 for my $package ( @selected_packages ) {
 # Check rpmbuild requirements
 if ($package =~ /kernel-ib|ib-bonding/) {


Thanks
Olga


-Original Message-
From: Olga Shern 
Sent: Thursday, August 21, 2008 4:23 PM
To: '[EMAIL PROTECTED]'
Cc: OF-EWG
Subject: RE: [ewg] OFED installation on RH5 UP2

Thanks Vlad,

It seems indeed better solution, I will test it and let you know.

-Original Message-
From: Vladimir Sokolovsky [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 19, 2008 12:15 PM
To: Olga Shern
Cc: OF-EWG
Subject: RE: [ewg] OFED installation on RH5 UP2

On Mon, 2008-08-18 at 20:11 +0300, Olga Shern wrote:
> Hi Vlad,
> 
> I have also tested minimal OS installation.
> We need to test whether rpm-build package is installed (it is not
> installed by default) also for all debuginfo RPMs redhat-rpm-config is
> required. 
> Attached is the patch that fixes this.
> 
> Thanks,
> Olga
>  

Hi Olga,
I used a different patch instead because the check for redhat-rpm-config
existence is relevant for all debuginfo RPMs. Please see if this is OK.

Regarding lam MPI, I think, we should check if it was registered with
mpi-selector and re-register it after updating mpi-selector.

Thanks,
Vladimir

>From b98936d574b9fed5ff0493ee69d558ca67788810 Mon Sep 17 00:00:00 2001
From: Vladimir Sokolovsky <[EMAIL PROTECTED]>
Date: Tue, 19 Aug 2008 11:54:00 +0300
Subject: [PATCH] redhat-rpm-config RPM should be installed on RedHat
Distributions
 in order to build debuginfo RPMs.

Signed-off-by: Vladimir Sokolovsky <[EMAIL PROTECTED]>
---
 install.pl |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/install.pl b/install.pl
index 8966793..5155cfe 100755
--- a/install.pl
+++ b/install.pl
@@ -2451,6 +2451,15 @@ sub check_linux_dependencies
 }
 }
 
+if ($package =~ /debuginfo/ and ($distro eq 'redhat' or $distro
eq 'fedora' or $distro eq 'redhat5')) {
+if (not $packages_info{$package}{'rpm_exist'}) {
+if (not is_installed("redhat-rpm-config")) {
+print RED "redhat-rpm-config rpm is required to
build $package", RESET "\n";
+$err++;
+}
+}
+}
+
 if (not $packages_info{$package}{'rpm_exist'}) {
 for my $req ( @{ $packages_info{$package}{'dist_req_build'}
} ) {
 my ($req_name, $req_version) = (split ('_',$req));
-- 
1.5.4.3


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Fwd: [ofa-general] [PATCH v3] ib/core: fix for send multicast group send leave retry

2008-08-27 Thread Olga Shern (Voltaire)
Hi Vlad,

Please add this patch to OFED 1.4
It is in Roland's tree for 2.6.28

Thanks
Olga


-- Forwarded message --
From: Yossi Etigin <[EMAIL PROTECTED]>
Date: Aug 11, 2008 7:35 PM
Subject: [ofa-general] [PATCH v3] ib/core: fix for send multicast
group send leave retry
To: Roland Drier <[EMAIL PROTECTED]>
Cc: Olga Shern <[EMAIL PROTECTED]>, general list
<[EMAIL PROTECTED]>, Ron Livne <[EMAIL PROTECTED]>


Until now, only if joining a multicast group failed there was a retry
mechanism.
This patch will add a mechanism that will retry to leave a multicast
group before giving up.

Changes from v1:
- Save the leave state because it's overridden
- use 'else'

Changes from v2:
- Call mcast_work_handler() when send_leave() fails

Signed-off-by: Ron Livne <[EMAIL PROTECTED]>
Signed-off-by: Yossi Etigin <[EMAIL PROTECTED]>


Index: b/drivers/infiniband/core/multicast.c
===
--- a/drivers/infiniband/core/multicast.c   2008-08-11
19:13:26.0 +0300
+++ b/drivers/infiniband/core/multicast.c   2008-08-11
19:34:21.0 +0300
@@ -106,6 +106,8 @@ struct mcast_group {
   struct ib_sa_query  *query;
   int query_id;
   u16 pkey_index;
+   u8  leave_state;
+   int retries;
};

struct mcast_member {
@@ -350,6 +352,7 @@ static int send_leave(struct mcast_group

   rec = group->rec;
   rec.join_state = leave_state;
+   group->leave_state = leave_state;

   ret = ib_sa_mcmember_rec_query(&sa_client, port->dev->device,
  port->port_num, IB_SA_METHOD_DELETE, &rec,
@@ -542,7 +545,11 @@ static void leave_handler(int status, st
{
   struct mcast_group *group = context;

-   mcast_work_handler(&group->work);
+   if (status && (group->retries > 0) &&
+   !send_leave(group, group->leave_state))
+   group->retries--;
+   else
+   mcast_work_handler(&group->work);
}

static struct mcast_group *acquire_group(struct mcast_port *port,
@@ -565,6 +572,7 @@ static struct mcast_group *acquire_group
   if (!group)
   return NULL;

+   group->retries = 3;
   group->port = port;
   group->rec.mgid = *mgid;
   group->pkey_index = MCAST_INVALID_PKEY_INDEX;
-- 
--Yossi

___
general mailing list
[EMAIL PROTECTED]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED installation on RH5 UP2

2008-08-13 Thread Olga Shern
Hi Vlad,

 

I see OFED installation / uninstallation (install.pl / uninstall.sh)
failure on RH5 UP2 full OS installation.

 

There are three different issues:

1.  only relevant for uninstall.sh - on RH5 UP2 openmpi is installed
as 32 and 64 bit library to remove it need 

  to add --allmatches  flag to rpm -e command 

 

2.  On RH5 UP2 compat-dapl-1.2.5 is installed and  rpm -q
compat-dapl  return NULL in this case,

  Need to add compat-dapl-1.2.5 as part of the rpms that need to be
removed in uninstall.sh

 

The attached patch fixes the above two issue

 

3.  lam MPI and mpi-selector RPMs are installed on RH5 UP2.  

  lam mpi is depended on mpi-selector therefore mpi-selector cannot
not be removed

 

  I don't have a simple solution for this because we cannot remove
lam mpi.

  One of the solutions that I can think about, is not to remove
mpi-selector only update it.

  What do you think?

 

Regards,

Olga 

   

 



uninstall.sh.patch
Description: uninstall.sh.patch
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] RE: Patches for OFED 1.4 beta

2008-08-12 Thread Olga Shern
Thanks Jack,
We will send it

Olga

-Original Message-
From: Jack Morgenstein [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 12, 2008 4:47 PM
To: Olga Shern
Cc: Vladimir Sokolovsky; OF-EWG; Ron Livne; [EMAIL PROTECTED]
Subject: Re: Patches for OFED 1.4 beta

On Tuesday 12 August 2008 16:39, Olga Shern wrote:
> Hi Vlad,
> 
>  
> 
> Please add the attached patches (in emails) to OFED 1.4 beta.
> 
> These patches should be applied after Jack's XRC patches.
> 
>  
> 
Please also send the librdmacm patch -- We'll open a librdmacm library git
for ofed_1_4 on the OpenFabrics git server (in the interest of saving time
so close to the beta), complete with "fixes" directory, and add your
patch to the librdmacm fixes, and integrate it into the ofed 1.4 build.

- Jack
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] [PATCH] ipiob: fix rtnl deadlock

2008-08-12 Thread Olga Shern
Thanks

-Original Message-
From: Vladimir Sokolovsky [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 12, 2008 3:26 PM
To: Yosef Eitgin
Cc: Olga Shern; OF-EWG
Subject: Re: [ewg] [PATCH] ipiob: fix rtnl deadlock

Yossi Etigin wrote:
> This fixes bug #1114 in bugzilla, which is a deadlock between
ipoib_stop 
> and mcast_join_task.
> 
> ipoib_stop is called with rtnl_lock, and flushes ipoib_workqueue.
> the flush operation might wait for mcast_join_task to finish, which
> in turn might wait for rtnl_lock.
> 
> Signed-off-by: Yossi Etigin <[EMAIL PROTECTED]>
> 
> -- 
> 

Added to ofed_1_4/linux-2.6.git ofed_kernel
kernel_patches/fixes/ipoib_0400_fix_rtnl_deadlock.patch

Regards,
Vladimir
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [PATCH] ipiob: fix rtnl deadlock

2008-08-12 Thread Olga Shern
Hi Vlad,

Please add this patch to OFED 1.4 beta. This is only preliminary
solution, 
once we will have solution acceptable by kernel we will replace it.

Thanks
Olga


-Original Message-
From: Yosef Eitgin 
Sent: Monday, August 11, 2008 8:25 PM
To: Vladimir Sokolovsky
Cc: OF-EWG; Olga Shern; Tziporet Koren
Subject: [PATCH] ipiob: fix rtnl deadlock

This fixes bug #1114 in bugzilla, which is a deadlock between ipoib_stop

and mcast_join_task.

ipoib_stop is called with rtnl_lock, and flushes ipoib_workqueue.
the flush operation might wait for mcast_join_task to finish, which
in turn might wait for rtnl_lock.

Signed-off-by: Yossi Etigin <[EMAIL PROTECTED]>

--

Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c2008-08-04
18:09:33.0 +0300
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c2008-08-04
18:39:08.0 +0300
@@ -504,6 +504,7 @@
struct ipoib_dev_priv *priv =
container_of(work, struct ipoib_dev_priv,
mcast_join_task.work);
struct net_device *dev = priv->dev;
+   int ret;

if (!test_bit(IPOIB_MCAST_RUN, &priv->flags))
return;
@@ -577,9 +578,16 @@
priv->mcast_mtu =
IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));

if (!ipoib_cm_admin_enabled(dev)) {
-   rtnl_lock();
-   dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
-   rtnl_unlock();
+   /* Avoid deadlock with ipoib_stop */
+   while (!(ret = rtnl_trylock()) &&
+  test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+   yield();
+
+   if (ret) {
+   dev_set_mtu(dev, min(priv->mcast_mtu,
priv->admin_mtu));
+   rtnl_unlock();
+   } else
+   ipoib_dbg_mcast(priv, "ignoring mtu setup
because device is down\n");
}

ipoib_dbg_mcast(priv, "successfully joined all multicast
groups\n");

--
--Yossi

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] patch for install.pl script

2008-08-04 Thread Olga Shern
Great :)

-Original Message-
From: Vladimir Sokolovsky [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 04, 2008 11:44 AM
To: Olga Shern
Cc: ewg@lists.openfabrics.org
Subject: Re: [ewg] patch for install.pl script

Olga Shern wrote:
> Hi Vlad,
> 
>  
> 
> The attached patch changes a little bit instll.pl RPM dependencies'
tests.
> 
> Instead of exit immediately when some RPM is not installed, do the
full 
> test and display all missing RPMs.
> 
> This will help user to know what RPMs he needs to install at once.
> 
>  
> 
> Olga
> 

Hi Olga,
Thanks, but I already did required changes yesterday:

http://lists.openfabrics.org/pipermail/ewg/2008-August/007407.html

Commit:
http://www.openfabrics.org/git/?p=~vlad/ofed_1_4_scripts.git;a=commit;h=
da29b4c6160a6cc5940d7e6041dc92eba93b3884

Regards,
Vladimir
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] patch for install.pl script

2008-08-04 Thread Olga Shern
Hi Vlad,

 

The attached patch changes a little bit instll.pl RPM dependencies'
tests.

Instead of exit immediately when some RPM is not installed, do the full
test and display all missing RPMs.

This will help user to know what RPMs he needs to install at once.

 

Olga



install.patch
Description: install.patch
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Ofed1.3 bonding problem

2008-07-15 Thread Olga Shern (Voltaire)
On 7/15/08, Acero Fernandez Alicia <[EMAIL PROTECTED]> wrote:
>
> Hi everybody,
>
> I have tried what is explained in the ib-bonding.txt file, but it doesn't
> work. What are the lines in the /etc/modprobe.conf for Redhat Enterprise
> linux 4 up 5? Or perhaps there is some more information needed.
>
There is no need to add anything to /etc/modprobe.conf if your OS is RH4 UP5

> Could anyone help me?
>

Can you please send your network scripts for bonding and ifconfig
output and dmesg.


Thanks
Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


FW: [ewg] Ofed1.3 bonding problem

2008-07-10 Thread Olga Shern
Hi Alicia,

You are right bonding package inside OFED replaces bonding package
that is installed on your OS (ib-bonding is installed under
/lib/modules//updates, therefore if you will remove it, you
will have native bonding working).

I assume that your OS is RH 4.
Indeed in OFED 1.3 there was a bug, that Ethernet bonding didn't work
on RH4, but there is a workaround in OFED 1.3.1.
ib-bonding rpm includes ib-bonding.txt file that has an instruction
how to configure Ethernet bonding:
(/usr/share/doc/packages/ib-bonding-0.9.0/ib-bonding.txt)

3.3 Configuring Ethernet slaves
---
It is not possible to have a mix of Ethernet slaves and IPoIB slaves under the
same bonding master. It is possible however that a bonding master of Ethernet
slaves and a bonding master of IPoIB slaves will co-exist in one machine.
To configure Ethernet slaves under a bonding master use the same instructions
as for IPoIB slaves (according  to the OS) with one exception. When working
under Redhat-AS4 do the following when configuring a bonding  master with
Ethernet slaves

- In the master configuration file add the line
SLAVEDEV=1
- In the slave configuration file leave the line
TYPE=InfiniBand

This bug will be fixed in OFED 1.4.

Please let me know if it helps.

Best Regards
Olga


-- Forwarded message --
From: Acero Fernandez Alicia <[EMAIL PROTECTED]>
Date: Jul 10, 2008 10:51 AM
Subject: [ewg] Ofed1.3 bonding problem
To: ewg@lists.openfabrics.org



Hi everybody,


I am trying to install ofed1.3 in my cluster. I have done ethernet
bonding in some network interfaces and I would like to install
ofed1.3, but when I try it ethernet network connection is lost. I have
been looking for a solution, but I have found that the module name for
ethernet bonding and for infiniband bonding is the same, then perhaps
it is the reason, is it true? In that case, how could I solve it? It
doesn´t seem to be solved in ofed1.3.1 because I have tried to install
it and the same happens.

Could you help me, please?

Regards
Alicia Confidencialidad: Este mensaje y
sus ficheros adjuntos se dirige exclusivamente a su destinatario y
puede contener información privilegiada o confidencial. Si no es vd.
el destinatario indicado, queda notificado de que la utilización,
divulgación y/o copia sin autorización está prohibida en virtud de la
legislación vigente. Si ha recibido este mensaje por error, le rogamos
que nos lo comunique inmediatamente respondiendo al mensaje y proceda
a su destrucción. Disclaimer: This message and its attached files is
intended exclusively for its recipients and may contain confidential
information. If you received this e-mail in error you are hereby
notified that any dissemination, copy or disclosure of this
communication is strictly prohibited and may be unlawful. In this
case, please notify us by a reply and delete this email and its
contents immediately. 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Ofed1.3 bonding problem

2008-07-10 Thread Olga Shern (Voltaire)
Hi Alicia,

You are right bonding package inside OFED replaces bonding package
that is installed on your OS (ib-bonding is installed under
/lib/modules//updates, therefore if you will remove it, you
will have native bonding working).

I assume that your OS is RH 4.
Indeed in OFED 1.3 there was a bug, that Ethernet bonding didn't work
on RH4, but there is a workaround in OFED 1.3.1.
ib-bonding rpm includes ib-bonding.txt file that has an instruction
how to configure Ethernet bonding:
(/usr/share/doc/packages/ib-bonding-0.9.0/ib-bonding.txt)

3.3 Configuring Ethernet slaves
---
It is not possible to have a mix of Ethernet slaves and IPoIB slaves under the
same bonding master. It is possible however that a bonding master of Ethernet
slaves and a bonding master of IPoIB slaves will co-exist in one machine.
To configure Ethernet slaves under a bonding master use the same instructions
as for IPoIB slaves (according  to the OS) with one exception. When working
under Redhat-AS4 do the following when configuring a bonding  master with
Ethernet slaves

- In the master configuration file add the line
SLAVEDEV=1
- In the slave configuration file leave the line
TYPE=InfiniBand

This bug will be fixed in OFED 1.4.

Please let me know if it helps.

Best Regards
Olga


On 7/10/08, Acero Fernandez Alicia <[EMAIL PROTECTED]> wrote:
>
> Hi everybody,
>
>
> I am trying to install ofed1.3 in my cluster. I have done ethernet bonding
> in some network interfaces and I would like to install ofed1.3, but when I
> try it ethernet network connection is lost. I have been looking for a
> solution, but I have found that the module name for ethernet bonding and for
> infiniband bonding is the same, then perhaps it is the reason, is it true?
> In that case, how could I solve it? It doesn´t seem to be solved in
> ofed1.3.1 because I have tried to install it and the same happens.
>
> Could you help me, please?
>
> Regards
> Alicia Confidencialidad: Este mensaje y sus
> ficheros adjuntos se dirige exclusivamente a su destinatario y puede
> contener información privilegiada o confidencial. Si no es vd. el
> destinatario indicado, queda notificado de que la utilización, divulgación
> y/o copia sin autorización está prohibida en virtud de la legislación
> vigente. Si ha recibido este mensaje por error, le rogamos que nos lo
> comunique inmediatamente respondiendo al mensaje y proceda a su destrucción.
> Disclaimer: This message and its attached files is intended exclusively for
> its recipients and may contain confidential information. If you received
> this e-mail in error you are hereby notified that any dissemination, copy or
> disclosure of this communication is strictly prohibited and may be unlawful.
> In this case, please notify us by a reply and delete this email and its
> contents immediately. 
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Agenda for the OFED meeting today (May 5)

2008-05-19 Thread Olga Shern (Voltaire)
On 5/19/08, Tziporet Koren <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> This is the agenda for the OFED meeting today:
> 1. OFED 1.3.1:
>   1.1 Schedule:
>rc1 - done on May 6
>rc2 - May 22 <== I propose to delay to Thursday since there are
> few IPOIB bugs on work
>GA  - May 29
>   1.2 OS support:
>SLES10 SP2 backports were done (thanks to Moshe from Voltaire)
>There is a request fro RHEL 5.2 - who has this OS and can help
> with the backports?
>   1.3 Bugs status
>Please set release version 1.3.1 for all bugs that should be
> resolved in 1.3.1
>In the way the bugs are assigned today it is very hard to
> extract the relevant bugs for the release.
>This is the list of bugs that should be resolved to my best
> knowledge (please add more):


There is also bug number 1004
1004  maj P2 RHEL
[EMAIL PROTECTED]  IPoIB failed on stress testing

1024normal  [EMAIL PROTECTED]  Bonding-Ping not recovery after
> reconnect the non active interface
> 1027normal  [EMAIL PROTECTED] kernel panic in mad.c
> handle_outgoing_dr_smp with RESULT_CONSUMED
> 1031normal  [EMAIL PROTECTED]  OpenSM fat tree routing thinks
> fat tree isn't
> 1032critical[EMAIL PROTECTED]   RHEL  5.1 and OFED 1.3
> cannot write IO blocks greater than 1024.
> 1038normal  [EMAIL PROTECTED]  Kernel panic while running
> tcp/ip ltp tests
> 1040normal  [EMAIL PROTECTED]Kernel Oops during "port up/down
> test"
> 1041normal  [EMAIL PROTECTED] Install Failed with memtrack
> flag in the conf file
> 1042normal  [EMAIL PROTECTED] ofed-1.3.1 install fails
>
> 2. OFED 1.4:
>- Kernel rebase status: we have prepared the new tree, make-dist
> pass but compilation still fails.
>  Any help to resolve compilation issues is welcome.
>  URL: git://git.openfabrics.org/ofed_1_4/linux-2.6.git
> ofed_kernel
>- Update from the participants (mainly on new
> components/features):
>  - NFSoRDMA - Jeff
>  - Management - Sasha
>  - Multiple EQs to best fit multi-core systems - we try to
> define it with Roland
>  - RDMA CM to support IPv6 - Woody any news on this?
>  - IB BMME and iWARP equivalent memory extensions - under
> progress on the general list
>
> 3. Open discussion
>   - Upgrade memory in the OFA server:
> This request raised long time ago and we had a promise to do it
> after 1.3 release. What is the status?
>   - Other topics ...
>
> Tziporet
>
>
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Fwd: [ofa-general] [PATCH/RFC] IPoIB: Handle case when P_Key is deleted and re-added at same index

2008-05-14 Thread Olga Shern (Voltaire)
On 5/14/08, Vladimir Sokolovsky <[EMAIL PROTECTED]> wrote:
>
> Olga Shern (Voltaire) wrote:
>
> >
> >
> >
> >
> >Hello Olga,
> >This patch can't be applied as is to the ofed-1.3.1 git tree:
> >
> >patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
> >Hunk #1 succeeded at 847 (offset -160 lines).
> >patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
> >Hunk #1 succeeded at 488 (offset -106 lines).
> >Hunk #2 FAILED at 729.
> >1 out of 2 hunks FAILED -- saving rejects to file
> >drivers/infiniband/ulp/ipoib/ipoib_ib.c.rej
> >
> >Can you recreate this patch against
> >git://git.openfabrics.org/ofed_1_3/linux-2.6.git
> ><http://git.openfabrics.org/ofed_1_3/linux-2.6.git> ofed_kernel?
> >
> >Regards,
> >Vladimir
> >
> >
> >  It was applied without issues on OFED 1.3, therfore I sent it as is.
> > I will recreate it against OFED 1.3.1
> >  Olga
> >
> >
> I added this patch (as is) to ofed-1.3.1 kernel git tree as
> kernel_patches/fixes/ipoib_0360_Handle_case_when_P_Key_is_deleted.patch.
> Thanks,
>
> Regards,
> Vladimir



Great,

Thanks
Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Fwd: [ofa-general] [PATCH/RFC] IPoIB: Handle case when P_Key is deleted and re-added at same index

2008-05-13 Thread Olga Shern (Voltaire)
>
> >
>
> Hello Olga,
> This patch can't be applied as is to the ofed-1.3.1 git tree:
>
> patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
> Hunk #1 succeeded at 847 (offset -160 lines).
> patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
> Hunk #1 succeeded at 488 (offset -106 lines).
> Hunk #2 FAILED at 729.
> 1 out of 2 hunks FAILED -- saving rejects to file
> drivers/infiniband/ulp/ipoib/ipoib_ib.c.rej
>
> Can you recreate this patch against git://
> git.openfabrics.org/ofed_1_3/linux-2.6.git ofed_kernel?
>
> Regards,
> Vladimir
>


It was applied without issues on OFED 1.3, therfore I sent it as is.
I will recreate it against OFED 1.3.1

Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Compiling OFED 1.3 on Gentoo

2008-05-12 Thread Olga Shern
Hi,

 

We are trying to compile OFED 1.3 on Gentoo and see the following error,

 

Build falls on libibcommon library with the error bellow.

 

Running  rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir'

--define 'dist ' --target i386 --define '_prefix /usr' --define
'_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr'
/tmp/OFED-1.3/SRPMS/libibcommon-1.0.8-1.ofed1.3.src.rpm

error: Macro %dist has empty body

error: Macro %dist has empty body

sh: line 0: fg: no job control

error: Failed build dependencies:

 is needed by libibcommon-1.0.8-1.ofed1.3.src Installing
/tmp/OFED-1.3/SRPMS/libibcommon-1.0.8-1.ofed1.3.src.rpm

Building target platforms: i386

Building for target i386

 

There is a strange space under 'error:' line, before 'is needed by
libibcommon-1.0.8-1.ofed1.3.src'

 

But if I install source RPM file and then running 'rpmbuild -ba
libibcommon.spec' then I can build RPM, so only rpmbuild --rebuild
command causing to problems.

 

Have someone seen this error before?

Have someone succeeded to build OFED 1.3 on Gentoo?

 

Thanks

Olga 

 

 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Fwd: [ofa-general] [PATCH/RFC] IPoIB: Handle case when P_Key is deleted and re-added at same index

2008-05-12 Thread Olga Shern (Voltaire)
Hi Vlad,

Please add this patch to OFED 1.3.1
In additional to the main purpose of this patch it is also fixes issues we
saw with partitioning and SM failover because of:

*Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually.  This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.*
**
Thanks
Olga


-- Forwarded message --
From: Roland Dreier <[EMAIL PROTECTED]>
Date: Apr 15, 2008 8:55 AM
Subject: [ofa-general] [PATCH/RFC] IPoIB: Handle case when P_Key is deleted
and re-added at same index
To: [EMAIL PROTECTED]

If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.

Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually.  This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.

Thanks to Venkata Subramonyam <[EMAIL PROTECTED]> for debugging this
problem and testing this fix.

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>
---
drivers/infiniband/ulp/ipoib/ipoib_cm.c |4 ++--
drivers/infiniband/ulp/ipoib/ipoib_ib.c |   10 +-
2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 9d411f2..9db7b0b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1007,9 +1007,9 @@ static int ipoib_cm_modify_tx_init(struct net_device
*dev,
   struct ipoib_dev_priv *priv = netdev_priv(dev);
   struct ib_qp_attr qp_attr;
   int qp_attr_mask, ret;
-   ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey,
&qp_attr.pkey_index);
+   ret = ib_find_pkey(priv->ca, priv->port, priv->pkey,
&qp_attr.pkey_index);
   if (ret) {
-   ipoib_warn(priv, "pkey 0x%x not in cache: %d\n", priv->pkey,
ret);
+   ipoib_warn(priv, "pkey 0x%x not found: %d\n", priv->pkey,
ret);
   return ret;
   }

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 8b4ff69..0205eb7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -594,7 +594,7 @@ static void ipoib_pkey_dev_check_presence(struct
net_device *dev)
   struct ipoib_dev_priv *priv = netdev_priv(dev);
   u16 pkey_index = 0;

-   if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey,
&pkey_index))
+   if (ib_find_pkey(priv->ca, priv->port, priv->pkey, &pkey_index))
   clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
   else
   set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
@@ -835,13 +835,13 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv
*priv, int pkey_event)
   clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
   ipoib_ib_dev_down(dev, 0);
   ipoib_ib_dev_stop(dev, 0);
-   ipoib_pkey_dev_delay_open(dev);
-   return;
+   if (ipoib_pkey_dev_delay_open(dev))
+   return;
   }
-   set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);

   /* restart QP only if P_Key index is changed */
-   if (new_index == priv->pkey_index) {
+   if (test_and_set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags) &&
+   new_index == priv->pkey_index) {
   ipoib_dbg(priv, "Not flushing - P_Key index not
changed.\n");
   return;
   }
--
1.5.5

___
general mailing list
[EMAIL PROTECTED]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] RE: [ofa-general] OFED May 5 meeting summary

2008-05-12 Thread Olga Shern (Voltaire)
On 5/12/08, Tziporet Koren <[EMAIL PROTECTED]> wrote:
>
> Moshe Kazir wrote:
>
> >
> > I have checked OFED-1.3.1-rc1 on SLES10 SP 2 Beta3.
> >
> > ib-bonding compile failed.  Everything else is compiled o.k.
> > Attached : ib-bonding error log.
> >
> >
> > I'll take the backport of ib-bonding to sles10 sp 2 on me (if needed,
> > I'll get Moni's help).
> >
> >
> >
> Thanks
> Please update when done.
> Any need for a change in the install script?


It seems that there is no need for changes in the install script,
I will update you

Tziporet




___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] OFED May 5 meeting summary

2008-05-11 Thread Olga Shern (Voltaire)
On 5/6/08, Tziporet Koren <[EMAIL PROTECTED]> wrote:
>
>
> May 5 OFED meeting summary:
> ===
>
> 1. OFED 1.3.1:
>1.1  Status of changes:
>IB-bonding - on work
>SRP failover - done (need more testing)
>SDP crashes - on work (not clear if we will have
> something on time)
>RDS fixes for RDMA API - done
>librdmacm 1.0.7 - done
>uDAPL updates - done
>Open MPI 1.2.6 - done
>MVAPICH 1.0.1 - done
>MVAPICH2 1.0.3 - done
>IPoIB - 2 bugs fixed. There are still two issue that
> should be resolved.
>Low level drivers: Changes that already committed:
>nes
>mlx4
>cxgb3
>ehca
>
>1.2 Schedule:
>rc1 - was released today
>rc2 - May 20
>GA  - May 29
>
>1.3 Discussion:
>- ipath driver is going to be updated
>- There is an issue of bonding and Ethernet drivers on RHEL4 -
> under debug
>- We wish to add support for SLES10 SP2. Already got an approval
> from Novell
>Any volunteer to provide the new backport patches?



Tziporet, we will do it.
Already started with it, seems like everything is compiled, need only
backport bonding

Olga

2. OFED 1.4:
>   Updated that the new tree will be ready next week - based on
> 2.6.26-rc
>
> 3. Update on OpenSuSE build system - Yiftah updated on the work that is
> done and problems:
>   - The system requires clean RPMs only (no use of install script) -
> they work to resolve
>   - We target this system toward releases (and not to replace the daily
> build system).
>   - we may try now with OFED 1.3.1
>
>
> Tziporet
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] ib/ipoib: don't drop multicast sends when it can be avoided

2008-04-30 Thread Olga Shern
Hi Vlad,

 

Please apply the attached patch.

It was applied already in the upstream kernel: 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Froland%2Finfiniband.git;
a=commitdiff_plain;h=b3e2749bf32f61e7beb259eb7cfb066d2ec6ad65

 

Patch's description:

 

When set_multicast_list() is called the multicast task is restarted

and the IPOIB_MCAST_STARTED bit is cleared.  As a result for some

window of time, multicast packets are not transmitted nor queued but

rather dropped by ipoib_mcast_send().  These dropped packets are

painful in two cases:

 

 - bonding fail-over which both calls set_multicast_list() on the new

   active slave and sends Gratuitous ARP through that slave.

 

 - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the

   device and issues IGMP leave.

 

In both these cases, depending on the scheduling of the IPoIB

multicast task, the packets would be dropped.  As a result, in the

bonding case, the failover would not be detected by the peers until

their neighbour is renewed the neighbour (which takes a few tens of

seconds).  In the IGMP case, the IP router doesn't get an IGMP leave

and would only learn on that from further probes on the group (also a

delay of at least a few tens of seconds).

 

Fix this by allowing transmission (or queuing) depending on the

IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.

 

 

Thanks

Olga



ipoib_0350_mcast_send.patch
Description: ipoib_0350_mcast_send.patch
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] OFED April 21 meeting summary

2008-04-28 Thread Olga Shern (Voltaire)
On 4/28/08, Roland Dreier <[EMAIL PROTECTED]> wrote:
>
> > Also it is very important for us that IPoIB 2 kernel panics will be
> fixed (
> > https://bugs.openfabrics.org/show_bug.cgi?id=989,
> > https://bugs.openfabrics.org/show_bug.cgi?id=985)
>
> Are either of these panics seen with upstream kernels?
>
> https://bugs.openfabrics.org/show_bug.cgi?id=989 is OFED bug


  https://bugs.openfabrics.org/show_bug.cgi?id=985 we will try to reproduce
it on upstream kernel and let you know
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] OFED April 21 meeting summary

2008-04-28 Thread Olga Shern (Voltaire)
Hi Tziporet,

I was on vacation, therefore couldn't attend this meeting, and I want to
update about Voltaire's plans for OFED 1.3.1.
We are working on bug fixes for Bonding and HA , minimal impact on traffic,
multicast and partitioning  during SM failover.

Also it is very important for us that IPoIB 2 kernel panics will be fixed (
https://bugs.openfabrics.org/show_bug.cgi?id=989,
https://bugs.openfabrics.org/show_bug.cgi?id=985)

Best Regards,
Olga


On 4/22/08, Tziporet Koren <[EMAIL PROTECTED]> wrote:
>
>  OFED April 21 meeting summary about 1.3.1 plans and OFED 1.4 development:
>
> 1. OFED 1.3.1:
>
>1.1  Planned changes:
>
>   ULPs changes:
>
>  IB-bonding - done
>  SRP failover - on work
>  SDP crashes - on work
>  RDS fixes for RDMA API - done
>  librdmacm 1.0.7 - done
>  Open MPI 1.2.6 - done
>  uDAPL - on work
>
>   Low level drivers: - each HW vendor should reply when the
>   changes will be ready
>
>  nes - will be ready on first week of May
>  mlx4 - fixes are ready; changes to support Eth are under
>  review of the submission to kernel so not clear if they will make it 
> on
>  time.
>
>  cxgb3 - will be ready by middle of may. Majority of
>  changes should be submitted for RC1.
>  ipath - wait for update from Betsy
>  ehca - wait for update from Christoph
>
>1.2 Schedule: we agreed that 2 release candidate should be
>sufficient
>
>   GA is planned for May-29
>   - RC1 - May 6
>   - RC2 - May 20
>
>Note: daily builds of 1.3.1 are already available at: *
>
> http://www.openfabrics.org/builds/ofed-1.3.1*
>
>
> 2. OFED 1.4:
>
>Release features were presented at Sonoma (presentation available at
>
> *http://www.openfabrics.org/archives/april2008sonoma.htm*
>)
>
>IPv6: Woody is looking for resources to add IPv6 support to the CMA.
>Hal noted that it will require a change in opensm too.
>
>Xsigo Vnic & Vhba - Not clear if they will make it
>
>Kernel tree is under work at: git://
>git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel
>We should try to get the kernel code to compile as soon as possible
>so everybody will be able to contribute code.
>
>Schedule reminder:
>==
>Release: Oct 06, 2008
>Features freeze: Jun 25, 08 (kernel 2.6.26 based)
>Alpha:  Jul 9, 08
>Beta:   Jul 30, 08 kernel 2.6.27-rcX (assuming it will be available)
>RC1:Aug 13, 08
>RC2:Aug 27, 08
>RC3-RC5/6 – every 5-10 days
>Latest RC to be used in OFA interop event
>GA: Oct 06 08
>
>
> Tziporet
>
>
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans

2008-03-27 Thread Olga Shern
On Mon, Mar 24, 2008 at 10:45 PM, Tziporet Koren <[EMAIL PROTECTED]>
wrote:

>
> OFED March 24 meeting summary about OFED 1.4 and 1.3.1 plans:
>
> *1.3.1 Release:*
> As we decided we should do a release in 2-3 month after 1.3.
> In addition if there are any special fixes as outcome from the interop we
> can do a release earlier.
> All - please send me your requests for fixed issues and needed time frame
> and I will publish 1.3.1 schedule based on this.
>

Hi Tziporet,

Our plans for OFED 1.3.1 are

1. Fix  issues related to SM failover:

   -  packets lost in IPoIB
   - packets lost during bonding failover
   - some issues  with multicast

We haven't open bugs in bugzilla about this yet.

2.  New bonding rpm with additional fixes - already done
3.  We see again issues with "PKey table reordering stops ipoib traffic" -
bug 420
4. IPoIB bug fix - multicast packets droped when calling set_multicats_list
- already sent to upstream and was applied - I will send this patch to OFED.

We would like that the following bugs will be fixed:
1. BUG 985 - IPoIB panic during openind stop
2. SDP kerenel panic bugs (971 , 969)

Best Regards,
Olga
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] OFED Feb-25 meeting summary

2008-02-26 Thread Olga Shern
Hi Tziporet,

Sorry I have missed the meeting, I was in the middle of debug something
and didn't pay attention to the time.

We are continuing the tests and didn't see any additional critical issues
except  bug 962,
that I found this morning.

Olga  (Voltaire)


On 2/26/08, Tziporet Koren <[EMAIL PROTECTED]> wrote:
>
>
> OFED Feb-25 meeting summary on OFED 1.3 GA readiness:
>
> 1. Agreed schedule:
>   RC6 - Feb 25 - done
>   GA - Feb 28
>
> 2. Status update
>Intel - weekend - run fine
>Qlogic - OK
>IBM - OK
>Neteffect - passing acceptance - no showstoppers
>Mellanox - OK
>Voltaire - no participation
>Chelsio - No participation
>
> 3. Bugs status:
>   As of yesterday there are no bugs that should hold the release
>
> 4. Open discussion
>a. Need to test interop between OFED 1.2 and OFED 1.3
>   Woody from Intel already checked it and basic functionality
> is working.
>b. OFED to be used in the interop event and plug-fest:
>   Rupert reported they are going to use OFED 1.3
>   There was a concern from Qlogic that they might need some
> change after RC6 but they found it is not needed
>c. Support (dot) releases:
>   - All agreed that we might need dot releases in case of
> critical issues
>   - Concern: the dot releases are less tested and QAed
> comparing to the major releases
>   - We should have at least 1 month between dot releases
>   - Must ensure only bug fixes are included in dot releases (no
> API/base kernel changes)
>
>
> Tziporet
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] Do you plan to support ib-bonding on RHEL-5 up1?

2008-02-25 Thread Olga Shern
Tziporet, sorry for the late response.

Yes, we support bonding on RH5 UP1


On 2/18/08, Tziporet Koren <[EMAIL PROTECTED]> wrote:
>
> Thanks,
> Tziporet
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg