Re: [openib-general] ipoib, ipv6 and multicast groups

2007-01-29 Thread Hal Rosenstock
On Mon, 2007-01-29 at 13:17, chas williams - CONTRACTOR wrote:
> recently our sm started throwing the following errors:
> 
> Jan 29 18:10:49 706710 [42003940] -> __get_new_mlid: ERR 1B23: All 
> available:32 mlids are taken
> Jan 29 18:10:49 706721 [42003940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
> __get_new_mlid failed
> Jan 29 18:10:51 345113 [42804940] -> __get_new_mlid: ERR 1B23: All 
> available:32 mlids are taken
> Jan 29 18:10:51 345132 [42804940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
> __get_new_mlid failed
> Jan 29 18:10:51 514312 [41802940] -> __get_new_mlid: ERR 1B23: All 
> available:32 mlids are taken
> Jan 29 18:10:51 514320 [41802940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
> __get_new_mlid failed
> Jan 29 18:10:51 735732 [42804940] -> __get_new_mlid: ERR 1B23: All 
> available:32 mlids are taken

32 is too low for MLID space support IMO.

> we tracked this down to a problem with ipoib interaction
> with ipv6.  ipv6 joins two multicast groups, instead of 
> just one like ipv4.
> 
>   # netstat -A inet6 -g  -n
>   ...
>   IPv6/IPv4 Group Memberships
>   Interface   RefCnt Group
>   --- -- -
>   lo  1  ff02::1
>   ib0 1  ff02::1:ff00:77a2
>   ib0 1  ff02::1
> 
> 
>   # netstat -A inet6 -g  -n
>   ...
>   IPv6/IPv4 Group Memberships
>   Interface   RefCnt Group
>   --- -- -
>   lo  1  224.0.0.1
>   ib0 1  224.0.0.1
> 
> 
>   # cat /sys/kernel/debug/ipoib/ib0_mcg
>   GID: ff12:401b::0:0:0:0:1
> created: 4298482097
> queuelen: 0
> complete:   yes
> send_only:   no
> 
>   GID: ff12:401b::0:0:0::
> created: 4298482097
> queuelen: 0
> complete:   yes
> send_only:   no
> 
>   GID: ff12:601b::0:0:0:0:1
> created: 4298482097
> queuelen: 0
> complete:   yes
> send_only:   no
> 
>   GID: ff12:601b::0:0:1:ff00:77a2
> created: 4298482097
> queuelen: 0
> complete:   yes
> send_only:   no
> 
> 
> the ff02::1:ff00:77a2 group is specific to the interface (link local),
> so each of our ib hosts running ipv6 registers its own unique multicast
> group.  since our network is bigger than 32 hosts, it appears that we
> have exceeded the multicast tables in our local switches and this is
> making opensm generate the above error.
> 
> besides not running ipv6, are there any thoughts about this?

This has been discussed on the list before. Last time was a thread on
"IPv6 and IPoIB scalability issue" back in late November (11/30) to
early December (12/2). There are some options presented. None have been
pursued to the best of my knowledge.

-- Hal

> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib, ipv6 and multicast groups

2007-01-29 Thread chas williams - CONTRACTOR
recently our sm started throwing the following errors:

Jan 29 18:10:49 706710 [42003940] -> __get_new_mlid: ERR 1B23: All available:32 
mlids are taken
Jan 29 18:10:49 706721 [42003940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
__get_new_mlid failed
Jan 29 18:10:51 345113 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 
mlids are taken
Jan 29 18:10:51 345132 [42804940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
__get_new_mlid failed
Jan 29 18:10:51 514312 [41802940] -> __get_new_mlid: ERR 1B23: All available:32 
mlids are taken
Jan 29 18:10:51 514320 [41802940] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
__get_new_mlid failed
Jan 29 18:10:51 735732 [42804940] -> __get_new_mlid: ERR 1B23: All available:32 
mlids are taken

we tracked this down to a problem with ipoib interaction
with ipv6.  ipv6 joins two multicast groups, instead of 
just one like ipv4.

# netstat -A inet6 -g  -n
...
IPv6/IPv4 Group Memberships
Interface   RefCnt Group
--- -- -
lo  1  ff02::1
ib0 1  ff02::1:ff00:77a2
ib0 1  ff02::1


# netstat -A inet6 -g  -n
...
IPv6/IPv4 Group Memberships
Interface   RefCnt Group
--- -- -
lo  1  224.0.0.1
ib0 1  224.0.0.1


# cat /sys/kernel/debug/ipoib/ib0_mcg
GID: ff12:401b::0:0:0:0:1
  created: 4298482097
  queuelen: 0
  complete:   yes
  send_only:   no

GID: ff12:401b::0:0:0::
  created: 4298482097
  queuelen: 0
  complete:   yes
  send_only:   no

GID: ff12:601b::0:0:0:0:1
  created: 4298482097
  queuelen: 0
  complete:   yes
  send_only:   no

GID: ff12:601b::0:0:1:ff00:77a2
  created: 4298482097
  queuelen: 0
  complete:   yes
  send_only:   no


the ff02::1:ff00:77a2 group is specific to the interface (link local),
so each of our ib hosts running ipv6 registers its own unique multicast
group.  since our network is bigger than 32 hosts, it appears that we
have exceeded the multicast tables in our local switches and this is
making opensm generate the above error.

besides not running ipv6, are there any thoughts about this?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general