Re: [ewg] OFA server fs is full

2009-07-23 Thread Ido Rosen
I've been extremely busy these past few weeks and haven't had a chance
to do much related to SA work on the server.  I'll look into this
tonight, but any migration might have to wait until the weekend.

On Thu, Jul 23, 2009 at 12:24 PM, Jeff Becker wrote:
> Hi Tziporet. I believe Ido is working on moving us to the new server.
>
> -jeff
>
> Tziporet Koren wrote:
>> Sasha Khapyorsky wrote:
>>
>>> Now there is:
>>>
>>> Filesystem           1K-blocks      Used Available Use% Mounted on
>>> /dev/sda1            151873632 139556036   4602784  97% /
>>>
>>>
>>> We will have a next "overflow" just in few days.
>>>
>>>
>>>
>>>
>> Jeff
>> When will we move to the new server?
>> Can Ido help with this?
>>
>> Tziporet
>>
>>
>
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: Compile error on 7/14 daily build of OFED-1.5

2009-07-23 Thread Woodruff, Robert J
 

This compile error still exists in today's daily build.
I did a little investigation and it looks like it is 
an include file search order problem. It looks like the
include file  was added to bitops.h for ia64 
for a backport for addr.c, however with this, it causes
the compile problem in kobject_backport.c . 
I was able to work around the problem with the following changes
to bitops.h, but I am not sure this is the best way to fix it.



#ifndef BACKPORT_ASM_BITOPS_H
#define BACKPORT_ASM_BITOPS_H

#include_next 
#if defined(__ia64__)
/* #include  */   <- causes a compile problem in 
kobject_backport.c

#define mb() ia64_mf()   <--- if I only add the defines that are needed from 
system.h instead
#ifdef CONFIG_SMP everything seems to compile and run OK. 
#define smp_mb() mb()
#else
#define smp_mb() barrier()
#endif

#endif

static inline void clear_bit_unlock(unsigned long nr, volatile unsigned long 
*addr)
{
smp_mb__before_clear_bit();
clear_bit(nr, addr);
}

#endif
~   

-Original Message-
From: Tziporet Koren [mailto:tzipo...@dev.mellanox.co.il] 
Sent: Thursday, July 16, 2009 7:24 AM
To: tzipo...@dev.mellanox.co.il
Cc: Woodruff, Robert J; EWG
Subject: Re: Compile error on 7/14 daily build of OFED-1.5

Tziporet Koren wrote:
> Woodruff, Robert J wrote:
>> I am seeing this build error when trying to compile
>> the 7/14 daily build on EL 5.3 on IA64. Not sure who the maintainer 
>> is of kobject_backport.c,
>> but it looks to be the culprit.
>>
>> woody
>>
>>
>> ckport/2.6.18-EL5.3/include/linux/slab.h:1,
>>  from 
>> /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.5/drivers/infiniband/core/kobject_backport.c:1:
>>  
>>
>> include/linux/bitops.h: At top level:
>> include/linux/bitops.h:57: error: conflicting types for 'fls_long'
>> /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.5/kernel_addons/backport/2.6.18-EL5.3/include/linux/log2.h:64:
>>  
>> error: previous implicit declaration of 'fls_long' was here
>> make[4]: *** 
>> [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.5/drivers/infiniband/core/kobject_backport.o]
>>  
>> Error 1
>>   
> There is no specific owner to the backports
> If you have any fix please send it.
> Otherwise we will try to look into it next week
>
> Tziporet
>
I see Jack just fixed it

Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] [Patch mthca backport] Don't use kmalloc > 128k

2009-07-23 Thread Roland Dreier

 > This will fix the 2^20 bits limit on our bitmaps once and for all.

Not really... since getting > 128KB of contiguous memory is likely to
fail anyway.

And I don't think the upstream kernel has that limit on kmalloc size
either (at least with SLUB, not sure about SLAB).

Really the long-term fix is to handle non-contiguous memory in the
bitmap allocator.  maybe using vmalloc(), although I always hate big
allocations with vmalloc too.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED release status

2009-07-23 Thread Tziporet Koren

Hi All,

I am going for two weeks vacation and wish to update you on the releases 
status:


OFED 1.4.2 - everything is ready beside 1678 bug fix.
  After the fix is approved by Jon  Vlad will release it 
next week


OFED 1.5 - Alpha release was done today.
Several modules backports are not completed yet, and we should move now 
to the new library schema we agreed at Sonoma


Betsy will replace me in the next EWG meeting on Monday 27-July and Jack 
will represent Mellanox



Regards,
Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED 1.5 alpha release is available

2009-07-23 Thread Tziporet Koren
OFED 1.5-alpha4 is available

Notes: 

The tarball is available on:
http://www.openfabrics.org/downloads/OFED/ofed-1.5/OFED-1.5-alpha4.tgz

To get BUILD_ID run ofed_info

Please report any issues in bugzilla https://bugs.openfabrics.org/  for
OFED 1.5

Vladimir & Tziporet




Release information:
--
Linux Operating Systems:
  o   Linux Operating Systems:
- RedHat EL4 up6:   2.6.9-67.ELsmp
- RedHat EL4 up7:   2.6.9-78.ELsmp
- RedHat EL4 up8:   2.6.9-89.ELsmp
- RedHat EL5 up2:   2.6.18-92.el5
- RedHat EL5 up3:   2.6.18-128.el5
- SLES10 SP2:   2.6.16.60-0.21-smp
- SLES11:   2.6.27.19-5-default
- OpenSuSE 10.3:2.6.22.5-31 *
- OEL 4 up7 2.6.9-78.ELsmp
- OEL 5 up2 2.6.18-92.el5
- CentOS5.2 2.6.18-92.el5
- CentOS5.3 2.6.18-128.el5
- kernel.org:   2.6.29 and 2.6.30

  * Minimal QA for these versions
  
Systems:
  * x86_64
  * x86
  * ia64
  * ppc64

Changes from OFED-1.4.1

1 General changes
  o Kernel code based on 2.6.30

2 SDP
  o Performance improvements

3 uDAPL
  o New library

4 Management
  o OpenSM
- Mesh Analysis for LASH routing algorithm.
- Reloadable OpenSM configuration (preliminary implemented)
- Routing paths sorted balancing (for UpDown and MinHops)
- Weighted Lid Matrices calculation (for UpDown, MinHop and DOR).
- I/O nodes connectivity (for FatTree).

5 MPI:
- For now same versions as in OFED 1.4.1

Tasks that should be completed for the beta
===
1. Complete Backports for all kernel modules
2. Move to new libraries package scheme (as we agreed in Sonoma)
3. SDP Zero Copy
4. Stability, stability, Stability ...

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFA server fs is full

2009-07-23 Thread Jeff Becker
Hi Tziporet. I believe Ido is working on moving us to the new server.

-jeff

Tziporet Koren wrote:
> Sasha Khapyorsky wrote:
>   
>> Now there is:
>>
>> Filesystem   1K-blocks  Used Available Use% Mounted on
>> /dev/sda1151873632 139556036   4602784  97% /
>>
>>
>> We will have a next "overflow" just in few days.
>>
>>
>>   
>> 
> Jeff
> When will we move to the new server?
> Can Ido help with this?
>
> Tziporet
>
>   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [Patch mthca backport] Don't use kmalloc > 128k

2009-07-23 Thread Doug Ledford

On Jul 23, 2009, at 4:20 AM, Jack Morgenstein wrote:

On Thursday 16 July 2009 21:08, Doug Ledford wrote:

On rhel4 and rhel5 machines, the kmalloc implementation does not
automatically forward kmalloc requests > 128kb to __get_free_pages.
Please include this patch in all rhel4 and rhel5 backport directories
so that we do the right thing in the mthca driver on rhel in regards
to kmalloc requests larger than 128k (at least in this code path,
there may be others lurking too, I'll forward additional patches if I
find they are needed).



commit a7f18a776785aecb5eb9967aef6f0f603b698ba0
Author: Doug Ledford 
Date:   Thu Jul 16 12:47:55 2009 -0400

   [mthca] Fix attempts to use kmalloc on overly large allocations

   Signed-off-by: Doug Ledford 


This needs a correct signed-off-by: line.  Mine got added when I put  
it in my local git tree, but the original patch came from Red Hat's  
bugzilla, bug #508902, author David Jeffery 





Roland,
I think that this patch should be taken into the mainstream kernel,  
rather
than just as a backport patch for RHEL.  (We can have a similar  
patch for mlx4).
I notice that __get_free_pages(), free_pages(), and get_order() are  
all in the

mainstream kernel.

This will fix the 2^20 bits limit on our bitmaps once and for all.
If you agree, I will post this patch and one for mlx4 on the general  
list.


Doug posted this patch on the EWG list.

Thanks Doug!

diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/ 
infiniband/hw/mthca/mthca_mr.c

index d606edf..312e18d 100644
--- a/drivers/infiniband/hw/mthca/mthca_mr.c
+++ b/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -152,8 +152,11 @@ static int mthca_buddy_init(struct mthca_buddy  
*buddy, int max_order)

goto err_out;

for (i = 0; i <= buddy->max_order; ++i) {
-   s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-   buddy->bits[i] = kmalloc(s * sizeof (long), GFP_KERNEL);
+   s = BITS_TO_LONGS(1 << (buddy->max_order - i)) * sizeof(long);
+   if(s > PAGE_SIZE)
+			buddy->bits[i] = (unsigned long *)__get_free_pages(GFP_KERNEL,  
get_order(s));

+   else
+   buddy->bits[i] = kmalloc(s, GFP_KERNEL);
if (!buddy->bits[i])
goto err_out_free;
bitmap_zero(buddy->bits[i],
@@ -166,9 +169,13 @@ static int mthca_buddy_init(struct mthca_buddy  
*buddy, int max_order)

return 0;

err_out_free:
-   for (i = 0; i <= buddy->max_order; ++i)
-   kfree(buddy->bits[i]);
-
+   for (i = 0; i <= buddy->max_order; ++i){
+   s = BITS_TO_LONGS(1 << (buddy->max_order - i)) * sizeof(long);
+   if(s > PAGE_SIZE)
+   free_pages((unsigned long)buddy->bits[i], get_order(s));
+   else
+   kfree(buddy->bits[i]);
+   }
err_out:
kfree(buddy->bits);
kfree(buddy->num_free);
@@ -178,10 +185,15 @@ err_out:

static void mthca_buddy_cleanup(struct mthca_buddy *buddy)
{
-   int i;
+   int i, s;

-   for (i = 0; i <= buddy->max_order; ++i)
-   kfree(buddy->bits[i]);
+   for (i = 0; i <= buddy->max_order; ++i){
+   s = BITS_TO_LONGS(1 << (buddy->max_order - i)) * sizeof(long);
+   if(s > PAGE_SIZE)
+   free_pages((unsigned long)buddy->bits[i], get_order(s));
+   else
+   kfree(buddy->bits[i]);
+   }

kfree(buddy->bits);
kfree(buddy->num_free);



--

Doug Ledford 

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband






PGP.sig
Description: This is a digitally signed message part
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] OFA server fs is full

2009-07-23 Thread Tziporet Koren

Sasha Khapyorsky wrote:


Now there is:

Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/sda1151873632 139556036   4602784  97% /


We will have a next "overflow" just in few days.


  

Jeff
When will we move to the new server?
Can Ido help with this?

Tziporet

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [Patch mthca backport] Don't use kmalloc > 128k

2009-07-23 Thread Jack Morgenstein
On Thursday 16 July 2009 21:08, Doug Ledford wrote:
> On rhel4 and rhel5 machines, the kmalloc implementation does not  
> automatically forward kmalloc requests > 128kb to __get_free_pages.   
> Please include this patch in all rhel4 and rhel5 backport directories  
> so that we do the right thing in the mthca driver on rhel in regards  
> to kmalloc requests larger than 128k (at least in this code path,  
> there may be others lurking too, I'll forward additional patches if I  
> find they are needed).
> 
> 
commit a7f18a776785aecb5eb9967aef6f0f603b698ba0
Author: Doug Ledford 
Date:   Thu Jul 16 12:47:55 2009 -0400

[mthca] Fix attempts to use kmalloc on overly large allocations

Signed-off-by: Doug Ledford 



Roland,
I think that this patch should be taken into the mainstream kernel, rather
than just as a backport patch for RHEL.  (We can have a similar patch for mlx4).
I notice that __get_free_pages(), free_pages(), and get_order() are all in the
mainstream kernel.

This will fix the 2^20 bits limit on our bitmaps once and for all.
If you agree, I will post this patch and one for mlx4 on the general list.

Doug posted this patch on the EWG list.

Thanks Doug!

diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c 
b/drivers/infiniband/hw/mthca/mthca_mr.c
index d606edf..312e18d 100644
--- a/drivers/infiniband/hw/mthca/mthca_mr.c
+++ b/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -152,8 +152,11 @@ static int mthca_buddy_init(struct mthca_buddy *buddy, int 
max_order)
goto err_out;
 
for (i = 0; i <= buddy->max_order; ++i) {
-   s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-   buddy->bits[i] = kmalloc(s * sizeof (long), GFP_KERNEL);
+   s = BITS_TO_LONGS(1 << (buddy->max_order - i)) * sizeof(long);
+   if(s > PAGE_SIZE)
+   buddy->bits[i] = (unsigned long 
*)__get_free_pages(GFP_KERNEL, get_order(s));
+   else 
+   buddy->bits[i] = kmalloc(s, GFP_KERNEL);
if (!buddy->bits[i])
goto err_out_free;
bitmap_zero(buddy->bits[i],
@@ -166,9 +169,13 @@ static int mthca_buddy_init(struct mthca_buddy *buddy, int 
max_order)
return 0;
 
 err_out_free:
-   for (i = 0; i <= buddy->max_order; ++i)
-   kfree(buddy->bits[i]);
-
+   for (i = 0; i <= buddy->max_order; ++i){
+   s = BITS_TO_LONGS(1 << (buddy->max_order - i)) * sizeof(long);
+   if(s > PAGE_SIZE)
+   free_pages((unsigned long)buddy->bits[i], get_order(s));
+   else
+   kfree(buddy->bits[i]);
+   }
 err_out:
kfree(buddy->bits);
kfree(buddy->num_free);
@@ -178,10 +185,15 @@ err_out:
 
 static void mthca_buddy_cleanup(struct mthca_buddy *buddy)
 {
-   int i;
+   int i, s;
 
-   for (i = 0; i <= buddy->max_order; ++i)
-   kfree(buddy->bits[i]);
+   for (i = 0; i <= buddy->max_order; ++i){
+   s = BITS_TO_LONGS(1 << (buddy->max_order - i)) * sizeof(long);
+   if(s > PAGE_SIZE)
+   free_pages((unsigned long)buddy->bits[i], get_order(s));
+   else
+   kfree(buddy->bits[i]);
+   }
 
kfree(buddy->bits);
kfree(buddy->num_free);
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg