Hi,

On 19:28 Thu 30 Jul     , Hal Rosenstock wrote:
> 
> Currently, MADs are pipelined to a single switch at a time which
> effectively serializes these requests due to processing at the SMA.
> This patch pipelines (stripes) them across the switches first before
> proceeding with successive blocks. As a result of this striping,
> multiple switches can process the set and respond concurrently
> which results in an improvement to the subnet initialization time.

The idea is nice. However I have some initial comments about an
implementation.

BTW should there be a reason for an option to preserve the current
behavior? (I don't know, just asking)

> This patch also introduces a new config option (max_smps_per_node)
> which indicates how deep the per node pipeline is (current default is 4).
> This also has the effect of limiting the number of times that the switch
> list is traversed. Maybe this embellishment is unnecessary.

Then why is it needed?

> All unicast routing protocols are updated for this with the exception
> of file.
> 
> A similar subsequent change will do this for MFTs.
> 
> Yevgeny Kliteynik <[email protected]> wrote:
> 
> With a small cluster of 17 IS4 switches and 11 HCAs and
> to artificially increase the cluster, LMC of 7 was used
> including EnhancedSwitchPort 0 LMC.
> 
> With the new code, LFT configuration is more than twice as
> fast as with the old code :)
> Current ucast manager ran on avarage for ~250msec, with the
> new code - 110-120msec.
> 
> Routing calculation phase of the ucast manager took ~1200 usec,
> the rest was sending the blocks and waiting for no more pending
> transactions.
> 
> No noticeable difference between various max_smps_per_node values
> was observed.

What is the reason? And what was value of 'max_wire_smps'?

> Here are some detailed results of different executions (the
> number on the left is timer value in usec):
> 
> Current ucast manager (w/o the optimization):
> 
> 000000 [LFT]: osm_ucast_mgr_process() - START
> 001131 [LFT]: ucast_mgr_process_tbl() - START
> 032251 [LFT]: ucast_mgr_process_tbl() - END
> 032263 [LFT]: osm_ucast_mgr_process() - END
> 253416 [LFT]: Done wait_for_pending_transactions()
> 
> New code, max_smps_per_node=0:
> 
> 001417 [LFT]: osm_ucast_mgr_process() - START (0 max_smps_per_node)
> 002690 [LFT]: ucast_mgr_process_tbl() - START
> 032946 [LFT]: ucast_mgr_process_tbl() - END
> 032948 [LFT]: osm_ucast_pipeline_tbl() - START
> 033846 [LFT]: osm_ucast_pipeline_tbl() - END
> 033858 [LFT]: osm_ucast_mgr_process() - END
> 108203 [LFT]: Done wait_for_pending_transactions()
> 
> New code, max_smps_per_node=1:
> 
> 007474 [LFT]: osm_ucast_mgr_process() - START (1 max_smps_per_node)
> 008735 [LFT]: ucast_mgr_process_tbl() - START
> 040071 [LFT]: ucast_mgr_process_tbl() - END
> 040074 [LFT]: osm_ucast_pipeline_tbl() - START
> 040103 [LFT]: osm_ucast_pipeline_tbl() - END
> 040114 [LFT]: osm_ucast_mgr_process() - END
> 120097 [LFT]: Done wait_for_pending_transactions()
> 
> New code, max_smps_per_node=4:
> 
> 004137 [LFT]: osm_ucast_mgr_process() - START (4 max_smps_per_node)
> 005380 [LFT]: ucast_mgr_process_tbl() - START
> 037436 [LFT]: ucast_mgr_process_tbl() - END
> 037439 [LFT]: osm_ucast_pipeline_tbl() - START
> 037495 [LFT]: osm_ucast_pipeline_tbl() - END
> 037506 [LFT]: osm_ucast_mgr_process() - END
> 114983 [LFT]: Done wait_for_pending_transactions()
> 
> 
> With IS3 based Qlogic switches, which do not handle DR packets forwarding
> in HW, with a fabric of ~1100 HCAs, ~280 switches:
> 
> Current OSM configures LFTs in ~2 seconds.
> New algorithm does the same job in 1.4-1.6 seconds (30%-20% speed up),
> depending on the max_smps_per_node value.
> 
> As in case of IS4 switches, the shortest config time was obtained with
> max_smps_per_node=0, which is unlimited pipeline.
> 
> 
> Signed-off-by: Hal Rosenstock <[email protected]>
> ---
> Changes since v1:
> Added Yevgeny's performance data to patch description above
> No change to actual patch
> 
> diff --git a/opensm/include/opensm/osm_base.h 
> b/opensm/include/opensm/osm_base.h
> index 0537002..617e8a9 100644
> --- a/opensm/include/opensm/osm_base.h
> +++ b/opensm/include/opensm/osm_base.h
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2006 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved.
>   *
> @@ -449,6 +449,18 @@ BEGIN_C_DECLS
>  */
>  #define OSM_DEFAULT_SMP_MAX_ON_WIRE 4
>  /***********/
> +/****d* OpenSM: Base/OSM_DEFAULT_SMP_MAX_PER_NODE
> +* NAME
> +*    OSM_DEFAULT_SMP_MAX_PER_NODE
> +*
> +* DESCRIPTION
> +*    Specifies the default number of VL15 SMP MADs allowed
> +*    per node for certain attributes.
> +*
> +* SYNOPSIS
> +*/
> +#define OSM_DEFAULT_SMP_MAX_PER_NODE 4
> +/***********/
>  /****d* OpenSM: Base/OSM_SM_DEFAULT_QP0_RCV_SIZE
>  * NAME
>  *    OSM_SM_DEFAULT_QP0_RCV_SIZE
> diff --git a/opensm/include/opensm/osm_sm.h b/opensm/include/opensm/osm_sm.h
> index cc8321d..1776380 100644
> --- a/opensm/include/opensm/osm_sm.h
> +++ b/opensm/include/opensm/osm_sm.h
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
> @@ -130,6 +130,7 @@ typedef struct osm_sm {
>       osm_sm_mad_ctrl_t mad_ctrl;
>       osm_lid_mgr_t lid_mgr;
>       osm_ucast_mgr_t ucast_mgr;
> +     boolean_t lfts_updated;

The name is unclear - actually this means "update in progress".

>       cl_disp_reg_handle_t sweep_fail_disp_h;
>       cl_disp_reg_handle_t ni_disp_h;
>       cl_disp_reg_handle_t pi_disp_h;
> @@ -524,6 +525,45 @@ osm_resp_send(IN osm_sm_t * sm,
>  *
>  *********/
>  
> +/****f* OpenSM: SM/osm_sm_set_next_lft_block
> +* NAME
> +*    osm_sm_set_next_lft_block
> +*
> +* DESCRIPTION
> +*    Set the next LFT (LinearForwardingTable) block in the indicated switch.
> +*
> +* SYNOPSIS
> +*/
> +void
> +osm_sm_set_next_lft_block(IN osm_sm_t *p_sm, IN osm_switch_t *p_sw,
> +                       IN uint8_t *p_block, IN osm_dr_path_t *p_path,
> +                       IN osm_madw_context_t *p_context);

Why should it be in osm_sm.[ch]? osm_ucast_mgr.c or osm_switch.c seem
much more appropriate place for this.

> +/*
> +* PARAMETERS
> +*    p_sm
> +*            [in] Pointer to an osm_sm_t object.
> +*
> +*    p_switch
> +*            [in] Pointer to the switch object.
> +*
> +*    p_block
> +*            [in] Pointer to the forwarding table block.
> +*
> +*    p_path
> +*            [in] Pointer to a directed route path object.
> +*
> +*    p_context
> +*            [in] Mad wrapper context structure to be copied into the wrapper
> +*            context, and thus visible to the recipient of the response.
> +*
> +* RETURN VALUES
> +*    None
> +*
> +* NOTES
> +*
> +* SEE ALSO
> +*********/
> +
>  /****f* OpenSM: SM/osm_sm_mcgrp_join
>  * NAME
>  *    osm_sm_mcgrp_join
> diff --git a/opensm/include/opensm/osm_subnet.h 
> b/opensm/include/opensm/osm_subnet.h
> index 59a32ad..f12afae 100644
> --- a/opensm/include/opensm/osm_subnet.h
> +++ b/opensm/include/opensm/osm_subnet.h
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2008 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   * Copyright (c) 2008 Xsigo Systems Inc.  All rights reserved.
>   *
> @@ -147,6 +147,7 @@ typedef struct osm_subn_opt {
>       uint32_t sweep_interval;
>       uint32_t max_wire_smps;
>       uint32_t transaction_timeout;
> +     uint32_t max_smps_per_node;
>       uint8_t sm_priority;
>       uint8_t lmc;
>       boolean_t lmc_esp0;
> diff --git a/opensm/include/opensm/osm_switch.h 
> b/opensm/include/opensm/osm_switch.h
> index 7ce28c5..e12113f 100644
> --- a/opensm/include/opensm/osm_switch.h
> +++ b/opensm/include/opensm/osm_switch.h
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2008 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
> @@ -102,6 +102,7 @@ typedef struct osm_switch {
>       osm_port_profile_t *p_prof;
>       uint8_t *lft;
>       uint8_t *new_lft;
> +     uint16_t lft_block_id_ho;
>       osm_mcast_tbl_t mcast_tbl;
>       unsigned endport_links;
>       unsigned need_update;
> diff --git a/opensm/include/opensm/osm_ucast_mgr.h 
> b/opensm/include/opensm/osm_ucast_mgr.h
> index a040476..fdea49a 100644
> --- a/opensm/include/opensm/osm_ucast_mgr.h
> +++ b/opensm/include/opensm/osm_ucast_mgr.h
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2008 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
> @@ -233,17 +233,42 @@ osm_ucast_mgr_init(IN osm_ucast_mgr_t * const p_mgr, IN 
> struct osm_sm * sm);
>  *    osm_ucast_mgr_destroy
>  *********/
>  
> -/****f* OpenSM: Unicast Manager/osm_ucast_mgr_set_fwd_table
> +/****f* OpenSM: Unicast Manager/osm_ucast_pipeline_tbl
>  * NAME
> -*    osm_ucast_mgr_set_fwd_table
> +*    osm_ucast_pipeline_tbl
>  *
>  * DESCRIPTION
> -*    Setup forwarding table for the switch (from prepared new_lft).
> +*    The osm_ucast_pipeline_tbl function pipelines the LFT
> +*    (LinearForwardingTable) sets across the switches
> +*    (from prepared new_lft).
>  *
>  * SYNOPSIS
>  */
> -int osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * const p_mgr,
> -                             IN osm_switch_t * const p_sw);
> +void osm_ucast_pipeline_tbl(IN osm_ucast_mgr_t * p_mgr);
> +/*
> +* PARAMETERS
> +*    p_mgr
> +*            [in] Pointer to an osm_ucast_mgr_t object.
> +*
> +* RETURN VALUES
> +*    None.
> +*
> +* NOTES
> +*
> +* SEE ALSO
> +*********/
> +
> +/****f* OpenSM: Unicast Manager/osm_ucast_mgr_set_fwd_tbl_top
> +* NAME
> +*    osm_ucast_mgr_set_fwd_tbl_top
> +*
> +* DESCRIPTION
> +*    Setup LinearFDBTop for the switch.
> +*
> +* SYNOPSIS
> +*/
> +int osm_ucast_mgr_set_fwd_tbl_top(IN osm_ucast_mgr_t * const p_mgr,
> +                               IN osm_switch_t * const p_sw);

I don't really like such separation (osm_ucast_mgr_set_fwd_tbl_top and
osm_ucast_pipeline_tbl). Why to not use a single function and update all
routing engines appropriately (you need to do it anyway), so that this
will only fill up new_lfts table?

>  /*
>  * PARAMETERS
>  *    p_mgr
> diff --git a/opensm/opensm/osm_lin_fwd_rcv.c b/opensm/opensm/osm_lin_fwd_rcv.c
> index 2edb8d3..cb131b4 100644
> --- a/opensm/opensm/osm_lin_fwd_rcv.c
> +++ b/opensm/opensm/osm_lin_fwd_rcv.c
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
> @@ -36,7 +36,7 @@
>  /*
>   * Abstract:
>   *    Implementation of osm_lft_rcv_t.
> - * This object represents the NodeDescription Receiver object.
> + * This object represents the Linear Forwarding Table Receiver object.
>   * This object is part of the opensm family of objects.
>   */
>  
> @@ -55,6 +55,7 @@ void osm_lft_rcv_process(IN void *context, IN void *data)
>  {
>       osm_sm_t *sm = context;
>       osm_madw_t *p_madw = data;
> +     osm_dr_path_t *p_path;
>       ib_smp_t *p_smp;
>       uint32_t block_num;
>       osm_switch_t *p_sw;
> @@ -62,6 +63,8 @@ void osm_lft_rcv_process(IN void *context, IN void *data)
>       uint8_t *p_block;
>       ib_net64_t node_guid;
>       ib_api_status_t status;
> +     uint8_t block[IB_SMP_DATA_SIZE];
> +     osm_madw_context_t mad_context;
>  
>       CL_ASSERT(sm);
>  
> @@ -94,6 +97,16 @@ void osm_lft_rcv_process(IN void *context, IN void *data)
>                               "\n\t\t\t\tSwitch 0x%" PRIx64 "\n",
>                               ib_get_err_str(status), cl_ntoh64(node_guid));
>               }
> +
> +             p_path = 
> osm_physp_get_dr_path_ptr(osm_node_get_physp_ptr(p_sw->p_node, 0));
> +
> +             mad_context.lft_context.node_guid = node_guid;
> +             mad_context.lft_context.set_method = TRUE;
> +
> +             osm_sm_set_next_lft_block(sm, p_sw, &block[0], p_path,
> +                                       &mad_context);
> +
> +             p_sw->lft_block_id_ho++;

Wouldn't it be simpler to encode block_id in a mad context?

>       }
>  
>       CL_PLOCK_RELEASE(sm->p_lock);
> diff --git a/opensm/opensm/osm_sm.c b/opensm/opensm/osm_sm.c
> index daa60ff..4e0fd2a 100644
> --- a/opensm/opensm/osm_sm.c
> +++ b/opensm/opensm/osm_sm.c
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   * Copyright (c) 2008 Xsigo Systems Inc.  All rights reserved.
>   *
> @@ -441,6 +441,45 @@ Exit:
>  
>  /**********************************************************************
>   **********************************************************************/
> +void osm_sm_set_next_lft_block(IN osm_sm_t *p_sm, IN osm_switch_t *p_sw,
> +                            IN uint8_t *p_block, IN osm_dr_path_t *p_path,
> +                            IN osm_madw_context_t *context)
> +{
> +     ib_api_status_t status;
> +
> +     for (;
> +          osm_switch_get_lft_block(p_sw, p_sw->lft_block_id_ho, p_block);
> +          p_sw->lft_block_id_ho++) {
> +             if (!p_sw->need_update && !p_sm->p_subn->need_update &&
> +                 !memcmp(p_block,
> +                         p_sw->new_lft + p_sw->lft_block_id_ho * 
> IB_SMP_DATA_SIZE,
> +                         IB_SMP_DATA_SIZE))
> +                     continue;
> +
> +             p_sm->lfts_updated = 1;
> +
> +             OSM_LOG(p_sm->p_log, OSM_LOG_DEBUG,
> +                     "Writing FT block %u to switch 0x%" PRIx64 "\n",
> +                     p_sw->lft_block_id_ho,
> +                     cl_ntoh64(context->lft_context.node_guid));
> +
> +             status = osm_req_set(p_sm, p_path,
> +                                  p_sw->new_lft +
> +                                  p_sw->lft_block_id_ho * IB_SMP_DATA_SIZE,
> +                                  IB_SMP_DATA_SIZE, IB_MAD_ATTR_LIN_FWD_TBL,
> +                                  cl_hton32(p_sw->lft_block_id_ho),
> +                                  CL_DISP_MSGID_NONE, context);
> +
> +             if (status != IB_SUCCESS)
> +                     OSM_LOG(p_sm->p_log, OSM_LOG_ERROR, "ERR 2E11: "
> +                             "Sending linear fwd. tbl. block failed (%s)\n",
> +                             ib_get_err_str(status));
> +             break;
> +     }
> +}
> +
> +/**********************************************************************
> + **********************************************************************/
>  static ib_api_status_t sm_mgrp_process(IN osm_sm_t * p_sm,
>                                      IN osm_mgrp_t * p_mgrp)
>  {
> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
> index ec15f8a..1964b7f 100644
> --- a/opensm/opensm/osm_subnet.c
> +++ b/opensm/opensm/osm_subnet.c
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2008 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   * Copyright (c) 2008 Xsigo Systems Inc.  All rights reserved.
>   *
> @@ -295,6 +295,7 @@ static const opt_rec_t opt_tbl[] = {
>       { "m_key_lease_period", OPT_OFFSET(m_key_lease_period), 
> opts_parse_net16, NULL, 1 },
>       { "sweep_interval", OPT_OFFSET(sweep_interval), opts_parse_uint32, 
> NULL, 1 },
>       { "max_wire_smps", OPT_OFFSET(max_wire_smps), opts_parse_uint32, NULL, 
> 1 },
> +     { "max_smps_per_node", OPT_OFFSET(max_smps_per_node), 
> opts_parse_uint32, NULL, 1 },
>       { "console", OPT_OFFSET(console), opts_parse_charp, NULL, 0 },
>       { "console_port", OPT_OFFSET(console_port), opts_parse_uint16, NULL, 0 
> },
>       { "transaction_timeout", OPT_OFFSET(transaction_timeout), 
> opts_parse_uint32, NULL, 1 },
> @@ -671,6 +672,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const 
> p_opt)
>       p_opt->m_key_lease_period = 0;
>       p_opt->sweep_interval = OSM_DEFAULT_SWEEP_INTERVAL_SECS;
>       p_opt->max_wire_smps = OSM_DEFAULT_SMP_MAX_ON_WIRE;
> +     p_opt->max_smps_per_node = OSM_DEFAULT_SMP_MAX_PER_NODE;
>       p_opt->console = strdup(OSM_DEFAULT_CONSOLE);
>       p_opt->console_port = OSM_DEFAULT_CONSOLE_PORT;
>       p_opt->transaction_timeout = OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC;
> @@ -1461,6 +1463,10 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t 
> *const p_opts)
>               "max_wire_smps %u\n\n"
>               "# The maximum time in [msec] allowed for a transaction to 
> complete\n"
>               "transaction_timeout %u\n\n"
> +             "# Maximum number of SMPs per node sent in parallel\n"
> +             "# (0 means unlimited)\n"
> +             "# Only applies to certain attributes\n"
> +             "max_smps_per_node %u\n\n"
>               "# Maximal time in [msec] a message can stay in the incoming 
> message queue.\n"
>               "# If there is more than one message in the queue and the last 
> message\n"
>               "# stayed in the queue more than this value, any SA request 
> will be\n"
> @@ -1470,6 +1476,7 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t 
> *const p_opts)
>               "single_thread %s\n\n",
>               p_opts->max_wire_smps,
>               p_opts->transaction_timeout,
> +             p_opts->max_smps_per_node,
>               p_opts->max_msg_fifo_timeout,
>               p_opts->single_thread ? "TRUE" : "FALSE");
>  
> diff --git a/opensm/opensm/osm_ucast_cache.c b/opensm/opensm/osm_ucast_cache.c
> index 216b496..31c930b 100644
> --- a/opensm/opensm/osm_ucast_cache.c
> +++ b/opensm/opensm/osm_ucast_cache.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2008      Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2008,2009 Mellanox Technologies LTD. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -1085,9 +1085,11 @@ int osm_ucast_cache_process(osm_ucast_mgr_t * p_mgr)
>                       memset(p_sw->lft, OSM_NO_PATH, IB_LID_UCAST_END_HO + 1);
>               }
>  
> -             osm_ucast_mgr_set_fwd_table(p_mgr, p_sw);
> +             osm_ucast_mgr_set_fwd_tbl_top(p_mgr, p_sw);
>       }
>  
> +     osm_ucast_pipeline_tbl(p_mgr);
> +
>       return 0;
>  }
>  
> diff --git a/opensm/opensm/osm_ucast_file.c b/opensm/opensm/osm_ucast_file.c
> index 2505c46..099e8ba 100644
> --- a/opensm/opensm/osm_ucast_file.c
> +++ b/opensm/opensm/osm_ucast_file.c
> @@ -168,8 +168,8 @@ static int do_ucast_file_load(void *context)
>                               "routing algorithm\n");
>               } else if (!strncmp(p, "Unicast lids", 12)) {
>                       if (p_sw)
> -                             osm_ucast_mgr_set_fwd_table(&p_osm->sm.
> -                                                         ucast_mgr, p_sw);
> +                             osm_ucast_mgr_set_fwd_tbl_top(&p_osm->sm.
> +                                                           ucast_mgr, p_sw);
>                       q = strstr(p, " guid 0x");
>                       if (!q) {
>                               OSM_LOG(&p_osm->log, OSM_LOG_ERROR,
> @@ -247,7 +247,7 @@ static int do_ucast_file_load(void *context)
>       }
>  
>       if (p_sw)
> -             osm_ucast_mgr_set_fwd_table(&p_osm->sm.ucast_mgr, p_sw);
> +             osm_ucast_mgr_set_fwd_tbl_top(&p_osm->sm.ucast_mgr, p_sw);
>  
>       fclose(file);
>       return 0;

I suppose that this breaks 'file' routing engine (did you test it?) -
instead of switch LFTs setup this will only update its TOPs.

> diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c
> index bde6dbd..d65c685 100644
> --- a/opensm/opensm/osm_ucast_ftree.c
> +++ b/opensm/opensm/osm_ucast_ftree.c
> @@ -2,7 +2,7 @@
>   * Copyright (c) 2009 Simula Research Laboratory. All rights reserved.
>   * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved.
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2007 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
> @@ -1905,8 +1905,8 @@ static void set_sw_fwd_table(IN cl_map_item_t * const 
> p_map_item,
>       ftree_fabric_t *p_ftree = (ftree_fabric_t *) context;
>  
>       p_sw->p_osm_sw->max_lid_ho = p_ftree->lft_max_lid;
> -     osm_ucast_mgr_set_fwd_table(&p_ftree->p_osm->sm.ucast_mgr,
> -                                 p_sw->p_osm_sw);
> +     osm_ucast_mgr_set_fwd_tbl_top(&p_ftree->p_osm->sm.ucast_mgr,
> +                                   p_sw->p_osm_sw);
>  }
>  
>  /***************************************************
> @@ -4005,6 +4005,8 @@ static int do_routing(IN void *context)
>       /* for each switch, set its fwd table */
>       cl_qmap_apply_func(&p_ftree->sw_tbl, set_sw_fwd_table, (void *)p_ftree);
>  
> +     osm_ucast_pipeline_tbl(&p_ftree->p_osm->sm.ucast_mgr);
> +
>       /* write out hca ordering file */
>       fabric_dump_hca_ordering(p_ftree);
>  
> diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c
> index 12b5e34..adf5f6c 100644
> --- a/opensm/opensm/osm_ucast_lash.c
> +++ b/opensm/opensm/osm_ucast_lash.c
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2008 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   * Copyright (c) 2007      Simula Research Laboratory. All rights reserved.
>   * Copyright (c) 2007      Silicon Graphics Inc. All rights reserved.
> @@ -1045,8 +1045,11 @@ static void populate_fwd_tbls(lash_t * p_lash)
>                                       physical_egress_port);
>                       }
>               }               /* for */
> -             osm_ucast_mgr_set_fwd_table(&p_osm->sm.ucast_mgr, p_sw);
> +             osm_ucast_mgr_set_fwd_tbl_top(&p_osm->sm.ucast_mgr, p_sw);
>       }
> +
> +     osm_ucast_pipeline_tbl(&p_osm->sm.ucast_mgr);
> +
>       OSM_LOG_EXIT(p_log);
>  }
>  
> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
> index 78a7031..86d1c98 100644
> --- a/opensm/opensm/osm_ucast_mgr.c
> +++ b/opensm/opensm/osm_ucast_mgr.c
> @@ -1,6 +1,6 @@
>  /*
>   * Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved.
> - * Copyright (c) 2002-2008 Mellanox Technologies LTD. All rights reserved.
> + * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved.
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
> @@ -315,16 +315,14 @@ Exit:
>  
>  /**********************************************************************
>   **********************************************************************/
> -int osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * p_mgr,
> -                             IN osm_switch_t * p_sw)
> +int osm_ucast_mgr_set_fwd_tbl_top(IN osm_ucast_mgr_t * p_mgr,
> +                               IN osm_switch_t * p_sw)
>  {
>       osm_node_t *p_node;
>       osm_dr_path_t *p_path;
>       osm_madw_context_t context;
>       ib_api_status_t status;
>       ib_switch_info_t si;
> -     uint16_t block_id_ho = 0;
> -     uint8_t block[IB_SMP_DATA_SIZE];
>       boolean_t set_swinfo_require = FALSE;
>       uint16_t lin_top;
>       uint8_t life_state;
> @@ -382,48 +380,8 @@ int osm_ucast_mgr_set_fwd_table(IN osm_ucast_mgr_t * 
> p_mgr,
>                               ib_get_err_str(status));
>       }
>  
> -     /*
> -        Send linear forwarding table blocks to the switch
> -        as long as the switch indicates it has blocks needing
> -        configuration.
> -      */
> -
> -     context.lft_context.node_guid = osm_node_get_node_guid(p_node);
> -     context.lft_context.set_method = TRUE;
> -
> -     if (!p_sw->new_lft) {
> -             /* any routing should provide the new_lft */
> -             CL_ASSERT(p_mgr->p_subn->opt.use_ucast_cache &&
> -                       p_mgr->cache_valid && !p_sw->need_update);
> -             goto Exit;
> -     }
> -
> -     for (block_id_ho = 0;
> -          osm_switch_get_lft_block(p_sw, block_id_ho, block);
> -          block_id_ho++) {
> -             if (!p_sw->need_update && !p_mgr->p_subn->need_update &&
> -                 !memcmp(block,
> -                         p_sw->new_lft + block_id_ho * IB_SMP_DATA_SIZE,
> -                         IB_SMP_DATA_SIZE))
> -                     continue;
> -
> -             OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG,
> -                     "Writing FT block %u\n", block_id_ho);
> -
> -             status = osm_req_set(p_mgr->sm, p_path,
> -                                  p_sw->new_lft +
> -                                  block_id_ho * IB_SMP_DATA_SIZE,
> -                                  sizeof(block), IB_MAD_ATTR_LIN_FWD_TBL,
> -                                  cl_hton32(block_id_ho), CL_DISP_MSGID_NONE,
> -                                  &context);
> +     p_sw->lft_block_id_ho = 0;
>  
> -             if (status != IB_SUCCESS)
> -                     OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, "ERR 3A05: "
> -                             "Sending linear fwd. tbl. block failed (%s)\n",
> -                             ib_get_err_str(status));
> -     }
> -
> -Exit:
>       OSM_LOG_EXIT(p_mgr->p_log);
>       return 0;
>  }
> @@ -508,7 +466,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * 
> p_map_item,
>               }
>       }
>  
> -     osm_ucast_mgr_set_fwd_table(p_mgr, p_sw);
> +     osm_ucast_mgr_set_fwd_tbl_top(p_mgr, p_sw);
>  
>       if (p_mgr->p_subn->opt.lmc)
>               free_ports_priv(p_mgr);
> @@ -516,6 +474,47 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * 
> p_map_item,
>       OSM_LOG_EXIT(p_mgr->p_log);
>  }
>  
> +static void ucast_mgr_pipeline_tbl(IN osm_switch_t *p_sw,
> +                                IN osm_ucast_mgr_t *p_mgr)
> +{
> +     osm_dr_path_t *p_path;
> +     osm_madw_context_t mad_context;
> +     uint8_t block[IB_SMP_DATA_SIZE];
> +
> +     OSM_LOG_ENTER(p_mgr->p_log);
> +
> +     CL_ASSERT(p_sw && p_sw->p_node);
> +
> +     OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG,
> +             "Processing switch 0x%" PRIx64 "\n",
> +             cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)));
> +
> +     /*
> +        Send linear forwarding table blocks to the switch
> +        as long as the switch indicates it has blocks needing
> +        configuration.
> +      */
> +     if (!p_sw->new_lft) {
> +             /* any routing should provide the new_lft */
> +             CL_ASSERT(p_mgr->p_subn->opt.use_ucast_cache &&
> +                       p_mgr->cache_valid && !p_sw->need_update);
> +             goto Exit;
> +     }
> +
> +     p_path = osm_physp_get_dr_path_ptr(osm_node_get_physp_ptr(p_sw->p_node, 
> 0));
> +
> +     mad_context.lft_context.node_guid = 
> osm_node_get_node_guid(p_sw->p_node);
> +     mad_context.lft_context.set_method = TRUE;
> +
> +     osm_sm_set_next_lft_block(p_mgr->sm, p_sw, &block[0], p_path,
> +                               &mad_context);
> +
> +     p_sw->lft_block_id_ho++;
> +
> +Exit:
> +     OSM_LOG_EXIT(p_mgr->p_log);
> +}
> +
>  /**********************************************************************
>   **********************************************************************/
>  static void ucast_mgr_process_neighbors(IN cl_map_item_t * p_map_item,
> @@ -870,6 +869,28 @@ static void sort_ports_by_switch_load(osm_ucast_mgr_t * 
> m)
>               add_sw_endports_to_order_list(s[i], m);
>  }
>  
> +void osm_ucast_pipeline_tbl(osm_ucast_mgr_t * p_mgr)
> +{
> +     cl_qmap_t *p_sw_tbl;
> +     osm_switch_t *p_sw;
> +     int i;
> +
> +     for (i = 0;
> +          !p_mgr->p_subn->opt.max_smps_per_node ||
> +          i < p_mgr->p_subn->opt.max_smps_per_node;
> +          i++) {
> +             p_mgr->sm->lfts_updated = 0;
> +             p_sw_tbl = &p_mgr->p_subn->sw_guid_tbl;
> +             p_sw = (osm_switch_t *) cl_qmap_head(p_sw_tbl);
> +             while (p_sw != (osm_switch_t *) cl_qmap_end(p_sw_tbl)) {
> +                     ucast_mgr_pipeline_tbl(p_sw, p_mgr);
> +                     p_sw = (osm_switch_t *) cl_qmap_next(&p_sw->map_item);
> +             }
> +             if (!p_mgr->sm->lfts_updated)
> +                     break;
> +     }
> +}

Is it possible (for example in case of send errors) that "partial" LFT
blocks sending will trigger wait_for_pending_transaction() completion?

Sasha

> +
>  static int ucast_mgr_build_lfts(osm_ucast_mgr_t * p_mgr)
>  {
>       cl_qlist_init(&p_mgr->port_order_list);
> @@ -904,6 +925,8 @@ static int ucast_mgr_build_lfts(osm_ucast_mgr_t * p_mgr)
>       cl_qmap_apply_func(&p_mgr->p_subn->sw_guid_tbl, ucast_mgr_process_tbl,
>                          p_mgr);
>  
> +     osm_ucast_pipeline_tbl(p_mgr);
> +
>       cl_qlist_remove_all(&p_mgr->port_order_list);
>  
>       return 0;
> 
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to