On 05/05/2016 04:51 PM, Saeed Mahameed wrote: > On Thu, May 5, 2016 at 8:16 PM, Doug Ledford <dledf...@redhat.com> wrote: >> >> That depends on which interface actually generated the oops. If it was >> the base interface, then I don't manually set any special params on it. >> If it's one of the vlan interfaces, then there is a NetworkManager >> dispatcher script that is intended to set the tc count on interface up: >> >> [root@rdma-virt-03 ~]$ more /etc/NetworkManager/dispatcher.d/98-mlx5_roce.4* >> :::::::::::::: >> /etc/NetworkManager/dispatcher.d/98-mlx5_roce.43-egress.conf >> :::::::::::::: >> #!/bin/sh >> interface=$1 >> status=$2 >> [ "$interface" = mlx5_roce.43 ] || exit 0 >> case $status in >> up) >> tc qdisc add dev mlx5_roce root mqprio num_tc 8 map 5 5 5 5 5 5 5 5 >> 5 5 >> 5 5 5 5 5 5 > > Well, here you are configuring 8 TCs on the base mlx5 interface, so > the answer to my question is yes.
Correct. I mentioned that at the end of my email ;-) > It appears that we have a bug in mlx5e_slelect_queue > > int channel_ix = fallback(dev, skb); > return priv->channeltc_to_txq_map[channel_ix][tc]; > > When num_tc > 1 the fallback can return any value between [0.. > num_channles * num_tc ] > > while channeltc_to_txq_map is an array of the size num_channels. > > so there is a good chance that channel_ix exceeds the array limits and > resulting OOPs. > >> # tc_wrap.py -i mlx5_roce -u 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 >> ;; >> esac >> --More--(Next file: >> /etc/NetworkManager/dispatcher.d/98-mlx5_roce.45-egress.conf:::::::::::::: >> /etc/NetworkManager/dispatcher.d/98-mlx5_roce.45-egress.conf >> :::::::::::::: >> #!/bin/sh >> interface=$1 >> status=$2 >> [ "$interface" = mlx5_roce.45 ] || exit 0 >> case $status in >> up) >> tc qdisc add dev mlx5_roce root mqprio num_tc 8 map 5 5 5 5 5 5 5 5 >> 5 5 >> 5 5 5 5 5 5 > > will, here you map all user skb prios (skb->priority) to HW tc 5. > BTW skprio or user prio in this example is never the vlan prio it is > the ipv4 (ToS). > > please see http://lartc.org/manpages/tc-prio.html Ok. > So to achieve a vlan prio to HW tc mapping, you will need to map the > skprios to vlan prios using vlan egress mapping > which i see you already do down below. I do, and this is all related to trying to get PFC working for RoCE on these cards. For the most part, the things you see here are documented in the Mellanox guides related to RoCE setup, or they are things I pulled from the tcwrap.py program that you guys distribute for setting this stuff up. > But, our select queue implementation will extract the vlan priority > and use the corresponding TC from our own > priv->channeltc_to_txq_map[channel_ix][up] mapping > where up is vlan user priority. but this only applies to kernel > traffic, i don't see why it is needed for RoCE. Read your own guides ;-). I'm using this one for your switches: https://community.mellanox.com/docs/DOC-1417 And these to try and get the linux machines configured properly: https://community.mellanox.com/docs/DOC-1414 https://community.mellanox.com/docs/DOC-1415 https://community.mellanox.com/docs/DOC-2311 https://community.mellanox.com/docs/DOC-2474 http://www.mellanox.com/related-docs/prod_software/RoCE_with_Priority_Flow_Control_Application_Guide.pdf The guides are helpful if your setup allows you to follow their exact example. But, they are shy on information about how to modify the examples to your specific situation. For instance, I have to use vlan priority 5 as my no-drop priority for RoCE traffic. I can't reliably tell which portions of the guide I must switch the 3s to 5s in order to get the new priority, and which uses of 3s in the guides relate to other things that could be mapped to 5. On a separate note, it's unclear to me if your switches and cards support more than one no-drop priority (other vendor's RoCE cards I'm using here don't, they only allow one no-drop priority for RoCE traffic and it must be 5). If it does support more than one, I'd actually like both 3 and 5 to be no-drop and for one vlan to use 3 and another to use 5. > As i said above configuring any num_tc > 1 might cause the panic you saw. > > Regarding the proper mapping to do for 45 => priority 5, 43 => prio 3. > the egress mappings you already did above should be sufficient, the > question is, do you need the vlan priorities to be mapped to a > specific HW TC dispatchers ? You'd have to tell me. The switch docs make it clear that it's best if no-drop priorities are mapped to TC1 or TC2 (which is not necessarily the same as the TC mapping you refer to here as far as I know, but it might be similar). The doc on setting up ConnectX-4 cards talks about the same basic TC dispatchers on the card, but instead of 4 like the switches have, there are 8. So, does the card's built in firmware/silicon have a preference for where no-drop traffic is queued via TC dispatches like the switches do? > > if not, then you don't need to configure "tc qdisc add dev mlx5_roce > root ..." at all. That appears to be a question for Mellanox to answer. I can't say. -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD
signature.asc
Description: OpenPGP digital signature