Hi Christopher, first, thank you for this one, I know how painful it can have been!
On Wed, Mar 14, 2018 at 09:56:19PM +0100, Christopher Faulet wrote: (...) > But it is not optimal and in some situations, it could be really > slower in multi-threaded mode than in single-threaded one. The problem is > that, > when we try to dequeue pending connections, we process it from the older one > to > the newer one independently to the thread's affinity. So we need to wait the > other threads' wakeup to really process them. If threads are blocked in the > poller, this will add a significant latency. This problem happens when maxconn > values are very low. This problem is not related to your patch but to the task scheduler, which we need to improve^Wfix so that it can properly wake up a task on a sleeping thread. We need to check this with Emeric and Olivier to see how to best address it in 1.8 and how to ensure that 1.9 is clean regarding this. I'm having a few comments on the code below : > @@ -97,44 +76,66 @@ static inline struct pendconn *pendconn_from_px(const > struct proxy *px) { > * The stream is immediately marked as "assigned", and both its <srv> and > * <srv_conn> are set to <srv>, > */ > -static struct stream *pendconn_get_next_strm(struct server *srv, struct > proxy *px) > +static struct stream *pendconn_process_next_strm(struct server *srv, struct > proxy *px) > { > - struct pendconn *ps, *pp; > - struct stream *strm; > - struct server *rsrv; > + struct pendconn *p = NULL; > + struct server *rsrv; > > rsrv = srv->track; > if (!rsrv) > rsrv = srv; > > - ps = pendconn_from_srv(srv); > - pp = pendconn_from_px(px); > - /* we want to get the definitive pendconn in <ps> */ > - if (!pp || !srv_currently_usable(rsrv)) { > - if (!ps) > - return NULL; > - } else { > - /* pendconn exists in the proxy queue */ > - if (!ps || tv_islt(&pp->strm->logs.tv_request, > &ps->strm->logs.tv_request)) > - ps = pp; > + if (srv->nbpend) { > + list_for_each_entry(p, &srv->pendconns, list) { > + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, &p->lock)) > + goto ps_found; > + } > + p = NULL; > + } > + > + ps_found: > + if (srv_currently_usable(rsrv) && px->nbpend) { > + struct pendconn *pp; > + > + list_for_each_entry(pp, &px->pendconns, list) { > + /* If the server pendconn is older than the proxy one, > + * we process the server one. */ > + if (p && !tv_islt(&pp->strm->logs.tv_request, > &p->strm->logs.tv_request)) > + goto pendconn_found; Here there is still a race : pp->strm is dereferenced out of the lock, so the owner could very well decide to disappear in the mean time. I think this check needs to be moved in the trylock below. Or another option (probably simpler and cleaner) would be to add a tv_request field to the pendconn, and directly use it. > + > + if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, &pp->lock)) { > + /* Let's switch from the server pendconn to the > + * proxy pendconn. Don't forget to unlock the > + * server pendconn, if any. */ > + if (p) > + HA_SPIN_UNLOCK(PENDCONN_LOCK, &p->lock); > + p = pp; > + goto pendconn_found; > + } > + } > } > - strm = ps->strm; > - __pendconn_free(ps); > > - /* we want to note that the stream has now been assigned a server */ > - strm->flags |= SF_ASSIGNED; > - strm->target = &srv->obj_type; > - __stream_add_srv_conn(strm, srv); > + if (!p) > + return NULL; > + > + pendconn_found: > + pendconn_unlinked(p); > + p->strm_flags = (p->strm_flags & (SF_DIRECT | SF_ADDR_SET)); > + p->strm_flags |= SF_ASSIGNED; I think that it would be simpler to consider that strm_flags start with the stream's initial flags, then only manipulate the ones we're interested in, and only apply the final mask at the end. This way the enumeration of all manipulated flags doesn't need to be known at each location. This means you should only keep SF_ASSIGNED above and drop the line filterinng on the two other flags. See below for the other instances. > @@ -206,26 +204,28 @@ struct pendconn *pendconn_add(struct stream *strm) > */ > int pendconn_redistribute(struct server *s) > { > - struct pendconn *pc, *pc_bck; > + struct pendconn *p, *pback; > int xferred = 0; > > + /* The REDISP option was specified. We will ignore cookie and force to > + * balance or use the dispatcher. */ > + if ((s->proxy->options & (PR_O_REDISP|PR_O_PERSIST)) != PR_O_REDISP) > + return 0; > + > HA_SPIN_LOCK(SERVER_LOCK, &s->lock); > - list_for_each_entry_safe(pc, pc_bck, &s->pendconns, list) { > - struct stream *strm = pc->strm; > + list_for_each_entry_safe(p, pback, &s->pendconns, list) { > + if (p->strm_flags & SF_FORCE_PRST) > + continue; > > - if ((strm->be->options & (PR_O_REDISP|PR_O_PERSIST)) == > PR_O_REDISP && > - !(strm->flags & SF_FORCE_PRST)) { > - /* The REDISP option was specified. We will ignore > - * cookie and force to balance or use the dispatcher. > - */ > + if (HA_SPIN_TRYLOCK(PENDCONN_LOCK, &p->lock)) > + continue; > > - /* it's left to the dispatcher to choose a server */ > - strm->flags &= ~(SF_DIRECT | SF_ASSIGNED | SF_ADDR_SET); > + /* it's left to the dispatcher to choose a server */ > + pendconn_unlinked(p); > + p->strm_flags = 0; It's probably better to only clear the interesting flags (those above) so that the real intent is not lost. This will certainly help when code changes again in the future. > - __pendconn_free(pc); > - task_wakeup(strm->task, TASK_WOKEN_RES); > - xferred++; > - } > + task_wakeup(p->strm->task, TASK_WOKEN_RES); > + HA_SPIN_UNLOCK(PENDCONN_LOCK, &p->lock); > } > HA_SPIN_UNLOCK(SERVER_LOCK, &s->lock); > return xferred; > @@ -238,66 +238,121 @@ int pendconn_redistribute(struct server *s) > */ > int pendconn_grab_from_px(struct server *s) > { > - int xferred; > + struct pendconn *p, *pback; > + int maxconn, xferred = 0; > > if (!srv_currently_usable(s)) > return 0; > > HA_SPIN_LOCK(PROXY_LOCK, &s->proxy->lock); > - for (xferred = 0; !s->maxconn || xferred < srv_dynamic_maxconn(s); > xferred++) { > - struct stream *strm; > - struct pendconn *p; > - > - p = pendconn_from_px(s->proxy); > - if (!p) > + maxconn = srv_dynamic_maxconn(s); > + list_for_each_entry_safe(p, pback, &s->proxy->pendconns, list) { > + if (s->maxconn && s->served + xferred >= maxconn) > break; > - p->strm->target = &s->obj_type; > - strm = p->strm; > - __pendconn_free(p); > - task_wakeup(strm->task, TASK_WOKEN_RES); > + > + if (HA_SPIN_TRYLOCK(PENDCONN_LOCK, &p->lock)) > + continue; > + > + pendconn_unlinked(p); > + p->strm_flags = (p->strm_flags & (SF_DIRECT | SF_ASSIGNED | > SF_ADDR_SET)); So here no need to change these flags anymore. > + p->srv = s; > + > + task_wakeup(p->strm->task, TASK_WOKEN_RES); > + HA_SPIN_UNLOCK(PENDCONN_LOCK, &p->lock); > + xferred++; > } > HA_SPIN_UNLOCK(PROXY_LOCK, &s->proxy->lock); > return xferred; > } > > -/* > - * Detaches pending connection <p>, decreases the pending count, and frees > - * the pending connection. The connection might have been queued to a > specific > - * server as well as to the proxy. The stream also gets marked unqueued. > +/* Try to dequeue pending connection attached to the stream <strm>. It must > + * always exists here. If the pendconn is still linked to the server or the > + * proxy queue, nothing is done and the function returns 1. Otherwise, > + * <strm>->flags and <strm>->target are updated, the pendconn is released > and 0 > + * is returned. > + */ > +int pendconn_dequeue(struct stream *strm) > +{ > + struct pendconn *p; > + > + if (unlikely(!strm->pend_pos)) { > + /* unexpected case because it is called by the stream itself and > + * only the stream can release a pendconn. So it is only > + * possible if a pendconn is released by someone else or if the > + * stream is supposed to be queued but without its associated > + * pendconn. In both cases it is a bug! */ > + abort(); > + } > + p = strm->pend_pos; > + HA_SPIN_LOCK(PENDCONN_LOCK, &p->lock); > + > + /* the pendconn is still linked to the server/proxy queue, so unlock it > + * and go away. */ > + if (!LIST_ISEMPTY(&p->list)) { > + HA_SPIN_UNLOCK(PENDCONN_LOCK, &p->lock); > + return 1; > + } Just a question, in case it's still linked, when will it be forced to be dequeued ? I'm unsure about the situation where we can end up here with an entry still linked in fact, and would like to ensure we never leave an outdated entry in the queue. > + /* the pendconn must be dequeued now */ > + if (p->srv) > + strm->target = &p->srv->obj_type; > + > + strm->flags &= ~(SF_DIRECT | SF_ASSIGNED | SF_ADDR_SET); > + strm->flags |= p->strm_flags; Here that would simply give : strm->flags &= ~(SF_DIRECT | SF_ASSIGNED | SF_ADDR_SET); strm->flags |= p->strm_flags & (SF_DIRECT | SF_ASSIGNED | SF_ADDR_SET); > + strm->pend_pos = NULL; > + HA_SPIN_UNLOCK(PENDCONN_LOCK, &p->lock); > + pool_free(pool_head_pendconn, p); > + return 0; > +} > + > +/* Release the pending connection <p>, and decreases the pending count if > + * needed. The connection might have been queued to a specific server as > well as > + * to the proxy. The stream also gets marked unqueued. > */ > void pendconn_free(struct pendconn *p) > { > + struct stream *strm = p->strm; > + > + HA_SPIN_LOCK(PENDCONN_LOCK, &p->lock); > + > + /* The pendconn was already unlinked, just release it. */ > + if (LIST_ISEMPTY(&p->list)) > + goto release; > + > if (p->srv) { > HA_SPIN_LOCK(SERVER_LOCK, &p->srv->lock); > + p->srv->nbpend--; > LIST_DEL(&p->list); > HA_SPIN_UNLOCK(SERVER_LOCK, &p->srv->lock); > - HA_ATOMIC_SUB(&p->srv->nbpend, 1); > } > else { > - HA_SPIN_LOCK(SERVER_LOCK, &p->strm->be->lock); > + HA_SPIN_LOCK(PROXY_LOCK, &strm->be->lock); > + strm->be->nbpend--; > LIST_DEL(&p->list); > - HA_SPIN_UNLOCK(SERVER_LOCK, &p->strm->be->lock); > - HA_ATOMIC_SUB(&p->strm->be->nbpend, 1); > + HA_SPIN_UNLOCK(PROXY_LOCK, &strm->be->lock); > } > - p->strm->pend_pos = NULL; > - HA_ATOMIC_SUB(&p->strm->be->totpend, 1); > + HA_ATOMIC_SUB(&strm->be->totpend, 1); I'm just seeing that on a few occurrences above, we only need <strm> to access the backend's lock. Most likely the code can be simplified by adding the pointer to the backend into the pendconn struct as well, just like we already have the pointer to the server. > + release: > + strm->pend_pos = NULL; > + HA_SPIN_UNLOCK(PENDCONN_LOCK, &p->lock); > pool_free(pool_head_pendconn, p); > } > > -/* Lock-free version of pendconn_free. */ > -static void __pendconn_free(struct pendconn *p) > +/* Remove the pendconn from the server/proxy queue. At this stage, the > + * connection is not really dequeued. It will be done during the > + * process_stream. This function must be called by function owning the locks > on > + * the pendconn _AND_ the server/proxy. It also decreases the pending count. > + */ > +static void pendconn_unlinked(struct pendconn *p) Just a minor matter of taste, for me the use of a past participle in the name "pendconn_unlinked" makes me think it returns a boolean indicating whether or not it's unlinked. Since it's an action in fact, it's probably better to call it "pendconn_unlink". Thanks! Willy