On Fri, Nov 4, 2016 at 12:07 PM, Jason Baron <jba...@akamai.com> wrote: > > > On 11/04/2016 02:43 PM, Alexander Duyck wrote: >> >> On Fri, Nov 4, 2016 at 7:45 AM, Jason Baron <jba...@akamai.com> wrote: >>> >>> From: Jason Baron <jba...@akamai.com> >>> >>> When read() is called on /proc/net/route requesting a size that is one >>> entry size (128 bytes) less than m->size or greater, the resulting output >>> has missing and/or duplicate entries. Since m->size is typically >>> PAGE_SIZE, >>> for a PAGE_SIZE of 4,096 this means that reads requesting more than 3,968 >>> bytes will see bogus output. >>> >>> For example: >>> >>> for i in {100..200}; do >>> ip route add 192.168.1.$i dev eth0 >>> done >>> dd if=/proc/net/route of=/tmp/good bs=1024 >>> dd if=/proc/net/route of=/tmp/bad bs=4096 >>> >>> # diff -q /tmp/good /tmp/bad >>> Files /tmp/good and /tmp/bad differ >>> >>> I think this has gone unnoticed, since the output of 'netstat -r' and >>> 'route' is generated by reading in 1,024 byte increments and thus not >>> corrupted. Further, the number of entries in the route table needs to be >>> sufficiently large in order to trigger the problematic case. >>> >>> The issue arises because fib_route_get_idx() does not properly handle >>> the case where pos equals iter->pos. This case only arises when we have >>> a large read buffer size because we end up re-requesting the last entry >>> that overflowed m->buf. In the case of a smaller read buffer size, >>> we don't exceed the size of m->buf, and thus fib_route_get_idx() is >>> called >>> with pos greater than iter->pos. >>> >>> Fix by properly handling the iter->pos == pos case. >>> >>> Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in >>> /proc/net/route") >>> Cc: Andy Whitcroft <a...@canonical.com> >>> Cc: Alexander Duyck <alexander.h.du...@intel.com> >>> Signed-off-by: Jason Baron <jba...@akamai.com> >>> --- >>> net/ipv4/fib_trie.c | 12 ++++++++++-- >>> 1 file changed, 10 insertions(+), 2 deletions(-) >>> >>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c >>> index 31cef3602585..1017533fc75c 100644 >>> --- a/net/ipv4/fib_trie.c >>> +++ b/net/ipv4/fib_trie.c >>> @@ -2411,12 +2411,17 @@ static struct key_vector >>> *fib_route_get_idx(struct fib_route_iter *iter, >>> loff_t pos) >>> { >>> struct key_vector *l, **tp = &iter->tnode; >>> + loff_t saved_pos = 0; >>> t_key key; >>> >>> /* use cache location of next-to-find key */ >>> if (iter->pos > 0 && pos >= iter->pos) { >>> pos -= iter->pos; >>> key = iter->key; >>> + if (pos == 0) { >>> + saved_pos = iter->pos; >>> + key--; >>> + } >>> } else { >>> iter->pos = 0; >>> key = 0; >>> @@ -2436,10 +2441,13 @@ static struct key_vector >>> *fib_route_get_idx(struct fib_route_iter *iter, >>> break; >>> } >>> >>> - if (l) >>> + if (l) { >>> iter->key = key; /* remember it */ >>> - else >>> + if (saved_pos) >>> + iter->pos = saved_pos; >>> + } else { >>> iter->pos = 0; /* forget it */ >>> + } >>> >>> return l; >>> } >> >> >> This doesn't seem correct to me. I will have to look through this. >> My understanding is that the value of iter->pos is supposed to be the >> next position for us to grab, not the last one that was retrieved. If >> we are trying to re-request the last value then we should be falling >> back into the else case for this since pos should be one less than >> iter->pos. The problem is the table could change out from under us >> which is one of the reasons why we don't want to try and rewind the >> key like you are doing here. >> >> - Alex >> > > Hi Alex, > > In this case, seq_read() has called m->op->next(), which sets iter->pos > equal to pos and iter->key to key + 1. However, when we then go to output > the item associated with key, the 'm->op->next()' call overflows. Thus, we > have a situation where iter->pos equals pos, iter->key = key + 1, but we > have not displayed the item at position 'key' (thus the bug is that we miss > the item at key). > > The change I proposed was simply to restart the search from 'key' in this > case. If that item has disappeared, we will output the next one, or if its > been replaced we will display its replacement. I think that is > ok? > > The bug could also be fixed by changing: > > if (iter->pos > 0 && pos >= iter->pos) { > > to say: > > if (iter->pos > 0 && pos > iter->pos) { > > But that restarts the search on every overflow, which could mean every page > size, and that seems suboptimal to me. Like-wise, if we make pos 1 less than > iter->pos that restarts the search. The idea with this patch is to not force > us to redo the entire search on each overflow. > > Thanks, > > -Jason
Actually I think the underlying issue is that we still have an unresolved off by one error. Specifically offset 0 actually represents two values, the header for the file and entry 0. I think the fix is for us to look at pushing pos to 1 for the start of the data, and we reserve POS 0 for SEQ_START_TOKEN. Doing that we should start at the correct offset for each section following the first page. - Alex