On 03/04/18 02:08, Alexei Starovoitov wrote: > I like patch 3 and going to play with it. > How did you test it ? Just test_verifier and test_progs (the latter has a failure "test_bpf_obj_id:FAIL:get-prog-info(fd) err 0 errno 2 i 0 type 1(1) info_len 80(40) jit_enabled 0 jited_prog_len 0 xlated_prog_len 72" but that was already there before my patch).
> Do you have processed_insn numbers for > cilium or selftests programs before and after? > There should be no difference, right? That's a good check, I'll do that. > As far as patch 1 it was very difficult to review, since several logically > different things clamped together. So breaking it apart: > - converting two arrays of subprog_starts and subprog_stack_depth into > single array of struct bpf_subprog_info is a good thing > - tsort is interesting, but not sure it's correct. more below > - but main change of combining subprog logic with do_check is no good. <snip> > There will be no 'do_check() across whole program' walk. > Combining subprog pass with do_check is going into opposite direction > of this long term work. Divide and conquer. Combining more things into > do_check is the opposite of this programming principle. The main object of my change here was to change the data structures, to store a subprogno in each insn aux rather than using bsearch() on the subprog_starts. I have now figured out the algorithm to do this in its own pass (previously I thought this needed a recursive walk which is why I wanted to roll it into do_check() - doing more than one whole-program recursive walk seems like a bad idea.) > My short term plan is to split basic instruction correctness checks > out of do_check() loop into separate pass and run it early on. I agree with that short term plan, sounds like a good idea. I'm still not sure I understand the long-term plan, though; since most insns' input registers will still need to be checked (I'm assuming majority of most real ebpf programs consists of computing and dereferencing pointers), the data flow analysis will have to end up doing all the same register updates current do_check() does (though potentially in a different order), e.g. if a function is called three times it will have to analyse it with three sets of input registers. Unless you have some way of specifying function preconditions, I don't see how it works. In particular something like char *f(char *p) { *p++ = 0; return p; } int main(void) { char *p = "abc"; /* represents getting a ptr from ctx or map */ p = f(p); p = f(p); p = f(p); return 0; } seems as though it would be difficult to analyse in any way more scalable than the current full recursive walk. Unless you somehow propagate register state _backwards_ as constraints when analysing a function...? In any case it seems like there are likely to be things which current verifier accepts which require 'whole-program' analysis to determine the dataflow (e.g. what if there were some conditional branches in f(), and the preconditions on p depended on the value of some other arg in r2?) > As far as tsort approach for determining max stack depth... > It's an interesting idea, but implementation is suffering from the same > 'combine everything into one loop' coding issue. > I think updating total_stack_depth math should be separate from sorting, > since the same function can be part of different stacks with different max. > I don't see how updating global subprog_info[i].total_stack_depth can > be correct. It has to be different for every stack and the longest > stack is not necessarily the deepest. May be I'm missing something, > but combined sort and stack_depth math didn't make it easy to review. > I find existing check_max_stack_depth() algo much easier to understand. The sort _is_ the way to compute total stack depth. The sort order isn't being stored anywhere; it's being done just so that each subprog gets looked at after all its callers have been considered. So when it gets selected as a 'frontier node', its maximum stack depth is known, and can thus be used to update its callees (note that we do a max_t() with each callee's existing total_stack_depth, thus getting the deepest stack of all call chains to the function). It may help to imagine drawing the call graph and labelling each node with a stack depth as it is visited; sadly that's difficult to show in an email (or a code comment). But I can try to explain it a bit better than "/* Update callee total stack depth */". I will also try to split up patch #1 into more pieces. I mistakenly thought that existing check_max_stack_depth() depended on some invariants that I was removing, but I guess that was only true while I had non-contiguous subprogs. -Ed