On Tue, Apr 16, 2019 at 4:28 PM Luck, Tony <tony.l...@intel.com> wrote:
>
> On Tue, Apr 16, 2019 at 04:18:57PM -0700, Cong Wang wrote:
> > > The problem case occurs when we've seen enough distinct
> > > errors that we have filled every entry, then we try to
> > > look up a pfn that is larger that any seen before.
> > >
> > > The loop:
> > >
> > >         while (min < max) {
> > >                 ...
> > >         }
> > >
> > > will terminate with "min" set to MAX_ELEMS. Then we
> > > execute:
> > >
> > >         this_pfn = PFN(ca->array[min]);
> > >
> > > which references beyond the end of the space allocated
> > > for ca->array.
> >
> > Exactly.
>
> Hmmm. But can we ever really have this happen?  The call
> sequence to get here looks like:
>
>
>         mutex_lock(&ce_mutex);
>
>         if (ca->n == MAX_ELEMS)
>                 WARN_ON(!del_lru_elem_unlocked(ca));
>
>         ret = find_elem(ca, pfn, &to);
>
> I.e. if the array was all the way full, we delete one element
> before calling find_elem().  So when we get here:
>
> static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
> {
>         u64 this_pfn;
>         int min = 0, max = ca->n;
>
> The biggest value "max" can have is MAX_ELEMS-1

This is exactly the explanation for why the crash is inside
memmove() rather than inside find_elem(). del_elem() actually
accesses off-by-two once we pass its 'if' check in line 232:

229 static void del_elem(struct ce_array *ca, int idx)
230 {
231         /* Save us a function call when deleting the last element. */
232         if (ca->n - (idx + 1))
233                 memmove((void *)&ca->array[idx],
234                         (void *)&ca->array[idx + 1],
235                         (ca->n - (idx + 1)) * sizeof(u64));
236
237         ca->n--;
238 }

idx is ca->n and ca->n is MAX_ELEMS-1, then the above if statement
becomes true, therefore idx+1 is MAX_ELEMS which is just beyond
the valid range.

Thanks.

Reply via email to