On 03/07/2018 04:25 AM, John Fastabend wrote: [...] > Thought about this a bit more and chatted with Daniel a bit. I think > a better solution is to set data_start = data_end = 0 by default in the > sendpage case. This will disallow any read/writes into the sendpage > data. Then if the user needs to read/write data we can use a helper > bpf_sk_msg_pull_data(start_byte, end_byte) which can pull the data into a > linear buffer as needed. This will ensure any user writes will not > change data after the BPF verdict (your concern). Also it will minimize > the amount of data that needs to be copied (my concern). In some of my > use cases where no data is needed we can simple not use the helper. Then > on the sendmsg side we can continue to set the (data_start, data_end) > pointers to the first scatterlist element. But, also use this helper to > set the data pointers past the first scatterlist element if needed. So > if someone wants to read past the first 4k bytes on a large send for > example this can be done with the above helper. BPF programs just > need to check (start,end) data pointers and can be oblivious to > if the program is being invoked by a call from sendpage or sendmsg. > > I think this is a fairly elegant solution. Finally we can further > optimize later with a flag if needed to cover the case where we > want to read lots of bytes but _not_ do the copy. We can debate > the usefulness of this later with actual perf data. > > All this has the added bonus that all I need is another patch on > top to add the helper. Pseudo code might look like this, > > my_bpf_prog(struct sk_msg_md *msg) { > void *data_end = msg->data_end; > void *data_start = msg->data_start; > > need = PARSE_BYTES; > > // ensure user actually sent full header > if (msg->size < PARSE_BYTES) { > bpf_msg_cork(PARSE_BYTES); > return SK_DROP; > } > > /* ensure we can read full header, if this is a > * sendmsg system call AND PARSE_BYTES are all in > * the first scatterlist elem this is a no-op. > * If this is a sendpage call will put PARSE_BYTES > * in a psock buffer to avoid user modifications. > */ > if (data_end - data_start < PARSE_BYTES) {
I think it might need to look like 'data_start + PARSE_BYTES > data_end' for verifier to recognize (unless LLVM generates code that way). > err = bpf_sk_msg_pull_data(0, PARSE_BYTES, flags); > if (err) > return SK_DROP; Above should be: if (unlikely(err || data_start + PARSE_BYTES > data_end)) return SK_DROP; Here for the successful case, you need to recheck since data pointers were invalidated due to the helper call. bpf_sk_msg_pull_data() would for the very first case potentially be called unconditionally at prog start though when you start out with 0 len anyway, basically right after msg->size test. > } > > // we have the full header parse it now > verdict = my_bpf_header_parser(msg); > return verdict; > } > > Future optimization can work with prologue to pull in bytes > more efficiently. And for what its worth I found a couple bugs > in the error path of the sendpage hook I can fix in the v2 as well. > > What do you think? > > @Daniel, sound more or less like what you were thinking? Yes, absolutely what I was thinking. We have exactly the same logic in tc/BPF today for the case when the direct packet access test fails and we want to pull in skb data from non-linear area, so we can in such case just call bpf_skb_pull_data(skb, len) and redo the test to access it privately after that. Thanks, Daniel