On Thu, Sep 18, 2014 at 08:07:31AM +0000, Wodkowski, PawelX wrote:
> > > +int
> > > +bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *bond_dev,
> > > +         uint8_t slave_pos)
> > > +{
> > > + struct bond_dev_private *internals = bond_dev->data->dev_private;
> > > + struct mode8023ad_data *data = &internals->mode4;
> > > + struct port *port;
> > > + uint8_t i;
> > > +
> > > + bond_mode_8023ad_stop(bond_dev);
> > > +
> > > + /* Exclude slave from transmit policy. If this slave is an aggregator
> > > +  * make all aggregated slaves unselected to force sellection logic
> > > +  * to select suitable aggregator for this port   */
> > > + for (i = 0; i < internals->active_slave_count; i++) {
> > > +         port = &data->port_list[slave_pos];
> > > +         if (port->used_agregator_idx == slave_pos) {
> > > +                 port->selected = UNSELECTED;
> > > +                 port->actor_state &= ~(STATE_SYNCHRONIZATION |
> > STATE_DISTRIBUTING |
> > > +                         STATE_COLLECTING);
> > > +
> > > +                 /* Use default aggregator */
> > > +                 port->used_agregator_idx = i;
> > > +         }
> > > + }
> > > +
> > > + port = &data->port_list[slave_pos];
> > > + timer_cancel(&port->current_while_timer);
> > > + timer_cancel(&port->periodic_timer);
> > > + timer_cancel(&port->wait_while_timer);
> > > + timer_cancel(&port->tx_machine_timer);
> > > +
> > These all seem rather racy.  Alarm callbacks are executed with the alarm 
> > list
> > locks not held.  So there is every possibility that you could execute these 
> > (or
> > any timer_cancel calls in this PMD in parallel with the internal state 
> > machine
> > timer callback, and leave either with a corrupted timer list (resulting 
> > from a
> > double free between here, and the actual callback site),
> 
> I don't think so. Yes, callbacks are executed with  alarm list locks not 
> held, but 
> this is not the issue because access to list itself is guarded by lock and 
> ap->executing variable. So list will not be trashed. Check source of 
> eal_alarm_callback(), rte_eal_alarm_set() and rte_eal_alarm_cancel().
> 
Yes, you're right, the list is probably safe wht the executing bit.

> > or a timer that is
> > actually still pending when a slave is removed.
> > 
> This is not the issue also, but problem might be similar. I assumed that 
> alarms
> are atomic but when I looked at rte alarms closer I saw a race condition
> between and rte_eal_alarm_cancel() from  bond_mode_8023ad_stop()
> and rte_eal_alarm_set() from state machines callback. This need to be 
> reworked in some way.

Yes, this is what I was referring to:

CPU0                            CPU1
rte_eal_alarm_callback          bond_8023ad_deactivate_slave
-bond_8023_ad_periodic_cb       timer_cancel
timer_set

If those timer functions operate on the same timer, the result is that you can
leave the stop/deactivate slave paths with a timer function for that slave still
pending. The bonding mode needs some internal state to serialize those
operations and determine if the timer should be reactivated.

Neil

Reply via email to