----- On Aug 11, 2016, at 9:28 PM, Boqun Feng boqun.f...@gmail.com wrote: > On Thu, Aug 11, 2016 at 11:26:30PM +0000, Mathieu Desnoyers wrote: >> ----- On Jul 24, 2016, at 2:01 PM, Dave Watson davejwat...@fb.com wrote: >> >> >>> +static inline __attribute__((always_inline)) >> >>> +bool rseq_finish(struct rseq_lock *rlock, >> >>> + intptr_t *p, intptr_t to_write, >> >>> + struct rseq_state start_value) >> > >> >>> This ABI looks like it will work fine for our use case. I don't think it >> >>> has been mentioned yet, but we may still need multiple asm blocks >> >>> for differing numbers of writes. For example, an array-based freelist >> >>> push: >> > >> >>> void push(void *obj) { >> >>> if (index < maxlen) { >> >>> freelist[index++] = obj; >> >>> } >> >>> } >> > >> >>> would be more efficiently implemented with a two-write rseq_finish: >> > >> >>> rseq_finish2(&freelist[index], obj, // first write >> >>> &index, index + 1, // second write >> >>> ...); >> > >> >> Would pairing one rseq_start with two rseq_finish do the trick >> >> there ? >> > >> > Yes, two rseq_finish works, as long as the extra rseq management overhead >> > is not substantial. >> >> I've added a commit implementing rseq_finish2() in my rseq volatile >> dev branch. You can fetch it at: >> >> https://github.com/compudj/linux-percpu-dev/tree/rseq-fallback >> >> I also have a separate test and benchmark tree in addition to the >> kernel selftests here: >> >> https://github.com/compudj/rseq-test >> >> I named the first write a "speculative" write, and the second write >> the "final" write. >> > > Maybe I miss something subtle, but if the first write is only a > "speculative" write, why can't we put it in the rseq critical section > rather than asm block? Like this: > > do_rseq(..., result, targetptr, newval > { > newval = index; > targetptr = &index; > if (newval < maxlen) > freelist[newval++] = obj; > else > result = false; > } > > No extra rseq_finish() is needed here, but maybe a little more > "speculative" writes?
This won't work unfortunately. The speculative stores need to be between the rseq_event_counter comparison instruction in the rseq_finish asm sequence and the final store. The ip fixup is really needed for correctness of speculative stores. The sequence number scheme only works for loads. Putting it in the C code between rseq_start and rseq_finish would lead to races such as: thread A thread B rseq_start <preempted> <sched in> rseq_start freelist[offset + 1] = obj rseq_finish offset++ <preempted> <sched in> freelist[newval + 1] = obj <--- corrupts the list content. <snip> > Besides, do we allow userspace programs do read-only access to the > memory objects modified by do_rseq(). If so, we have a problem when > there are two writes in a do_rseq()(either in the rseq critical section > or in the asm block), because in current implemetation, these two writes > are unordered, which makes the readers outside a do_rseq() could observe > the ordering of writes differently. > > For rseq_finish2(), a simple solution would be making the "final" write > a RELEASE. Indeed, we would need a release semantic for the final store here if this is the common use. Or we could duplicate the "flavors" of rseq_finish2 and add a rseq_finish2_release. We should find a way to eliminate code duplication there. I suspect we'll end up doing macros. Thanks, Mathieu > > Regards, > Boqun > >> Thanks, >> >> Mathieu >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. > > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com