Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Tue, 28 Jan 2014 17:19:38 -0800 Davidlohr Bueso wrote: > On Tue, 2014-01-28 at 19:36 -0500, Naoya Horiguchi wrote: > > On Mon, Jan 27, 2014 at 06:34:17PM -0800, Davidlohr Bueso wrote: > [...] > > > > If this retry is really essential for the fix, please comment the reason > > > > both in patch description and inline comment. It's very important for > > > > future code maintenance. > > > > > > So we locate the corresponding region in the reserve map, and if we are > > > below the current region, then we allocate a new one. Since we dropped > > > the lock to allocate memory, we have to make sure that we still need the > > > new region and that we don't race with the new status of the reservation > > > map. This is the whole point of the retry, and I don't see it being > > > suboptimal. > > > > I'm afraid that you don't explain why you need drop the lock for memory > > allocation. Are you saying that this unlocking comes from the difference > > between rwsem and spin lock? > > Because you cannot go to sleep while holding a spinlock, which is > exactly what kmalloc(GFP_KERNEL) can do. We *might* get a way with it > with GFP_ATOMIC, I dunno, but I certainly prefer this approach better. yup. You could do foo = kmalloc(size, GFP_NOWAIT); if (!foo) { spin_unlock(...); foo = kmalloc(size, GFP_KERNEL); if (!foo) ... spin_lock(...); } that avoids the lock/unlock once per allocation. But it also increases the lock's average hold times -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Tue, 28 Jan 2014 17:19:38 -0800 Davidlohr Bueso davidl...@hp.com wrote: On Tue, 2014-01-28 at 19:36 -0500, Naoya Horiguchi wrote: On Mon, Jan 27, 2014 at 06:34:17PM -0800, Davidlohr Bueso wrote: [...] If this retry is really essential for the fix, please comment the reason both in patch description and inline comment. It's very important for future code maintenance. So we locate the corresponding region in the reserve map, and if we are below the current region, then we allocate a new one. Since we dropped the lock to allocate memory, we have to make sure that we still need the new region and that we don't race with the new status of the reservation map. This is the whole point of the retry, and I don't see it being suboptimal. I'm afraid that you don't explain why you need drop the lock for memory allocation. Are you saying that this unlocking comes from the difference between rwsem and spin lock? Because you cannot go to sleep while holding a spinlock, which is exactly what kmalloc(GFP_KERNEL) can do. We *might* get a way with it with GFP_ATOMIC, I dunno, but I certainly prefer this approach better. yup. You could do foo = kmalloc(size, GFP_NOWAIT); if (!foo) { spin_unlock(...); foo = kmalloc(size, GFP_KERNEL); if (!foo) ... spin_lock(...); } that avoids the lock/unlock once per allocation. But it also increases the lock's average hold times -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Tue, 2014-01-28 at 19:36 -0500, Naoya Horiguchi wrote: > On Mon, Jan 27, 2014 at 06:34:17PM -0800, Davidlohr Bueso wrote: [...] > > > If this retry is really essential for the fix, please comment the reason > > > both in patch description and inline comment. It's very important for > > > future code maintenance. > > > > So we locate the corresponding region in the reserve map, and if we are > > below the current region, then we allocate a new one. Since we dropped > > the lock to allocate memory, we have to make sure that we still need the > > new region and that we don't race with the new status of the reservation > > map. This is the whole point of the retry, and I don't see it being > > suboptimal. > > I'm afraid that you don't explain why you need drop the lock for memory > allocation. Are you saying that this unlocking comes from the difference > between rwsem and spin lock? Because you cannot go to sleep while holding a spinlock, which is exactly what kmalloc(GFP_KERNEL) can do. We *might* get a way with it with GFP_ATOMIC, I dunno, but I certainly prefer this approach better. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Mon, Jan 27, 2014 at 06:34:17PM -0800, Davidlohr Bueso wrote: > On Mon, 2014-01-27 at 20:53 -0500, Naoya Horiguchi wrote: > > Hi Davidlohr, > > > > On Mon, Jan 27, 2014 at 01:44:02PM -0800, Davidlohr Bueso wrote: > > > On Mon, 2014-01-27 at 16:02 -0500, Naoya Horiguchi wrote: > > > > On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: > > > > > From: Joonsoo Kim > > > > > > > > > > There is a race condition if we map a same file on different > > > > > processes. > > > > > Region tracking is protected by mmap_sem and > > > > > hugetlb_instantiation_mutex. > > > > > When we do mmap, we don't grab a hugetlb_instantiation_mutex, but > > > > > only the, > > > > > mmap_sem (exclusively). This doesn't prevent other tasks from > > > > > modifying the > > > > > region structure, so it can be modified by two processes concurrently. > > > > > > > > > > To solve this, introduce a spinlock to resv_map and make region > > > > > manipulation > > > > > function grab it before they do actual work. > > > > > > > > > > Acked-by: David Gibson > > > > > Signed-off-by: Joonsoo Kim > > > > > [Updated changelog] > > > > > Signed-off-by: Davidlohr Bueso > > > > > --- > > > > ... > > > > > @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, > > > > > long f, long t) > > > > >* Subtle, allocate a new region at the position but make it > > > > > zero > > > > >* size such that we can guarantee to record the reservation. */ > > > > > if (>link == head || t < rg->from) { > > > > > - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); > > > > > - if (!nrg) > > > > > - return -ENOMEM; > > > > > + if (!nrg) { > > > > > + spin_unlock(>lock); > > > > > > > > I think that doing kmalloc() inside the lock is simpler. > > > > Why do you unlock and retry here? > > > > > > This is a spinlock, no can do -- we've previously debated this and since > > > the critical region is quite small, a non blocking lock is better suited > > > here. We do the retry so we don't race once the new region is allocated > > > after the lock is dropped. > > > > Using spinlock instead of rw_sem makes sense. > > But I'm not sure how the retry is essential to fix the race. > > (Sorry I can't find the discussion log about this.) > > As you did in your ver.1 (https://lkml.org/lkml/2013/7/26/296), > > simply doing like below seems to be fine to me, is it right? > > > > if (>link == head || t < rg->from) { > > nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); > > if (!nrg) { > > chg = -ENOMEM; > > goto out_locked; > > } > > nrg->from = f; > > ... > > } > > That's nice and simple because we were using the rwsem version. > > > > > In the current version nrg is initialized to NULL, so we always do retry > > once when adding new file_region. That's not optimal to me. > > Right, the retry can only occur once. > > > > > If this retry is really essential for the fix, please comment the reason > > both in patch description and inline comment. It's very important for > > future code maintenance. > > So we locate the corresponding region in the reserve map, and if we are > below the current region, then we allocate a new one. Since we dropped > the lock to allocate memory, we have to make sure that we still need the > new region and that we don't race with the new status of the reservation > map. This is the whole point of the retry, and I don't see it being > suboptimal. I'm afraid that you don't explain why you need drop the lock for memory allocation. Are you saying that this unlocking comes from the difference between rwsem and spin lock? I think if we call kmalloc() with the lock held we don't have to check that "we still need the new region" because resv->lock guarantees that no other thread changes the reservation map, right? > We just cannot retake the lock after we get the new region and just add > it to to the list. > > > > > And I noticed another point. I don't think the name of new goto label > > 'out_locked' is a good one. 'out_unlock' or 'unlock' is better. > > What worries me more is that we're actually freeing a valid new region > (nrg) upon exit. We certainly don't do so in the current code, and it > doesn't seem to be a leak. Instead, we should be doing: You're right. There is another goto in region_chg() where we never do the kmalloc, so calling kfree is a bug. Thanks, Naoya Horiguchi > if (>link == head || t < rg->from) { > if (!nrg) { > spin_unlock(>lock); > nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); > if (!nrg) > return -ENOMEM; > > goto retry; > } > > nrg->from = f; > nrg->to = f; > INIT_LIST_HEAD(>link); >
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Mon, Jan 27, 2014 at 06:34:17PM -0800, Davidlohr Bueso wrote: On Mon, 2014-01-27 at 20:53 -0500, Naoya Horiguchi wrote: Hi Davidlohr, On Mon, Jan 27, 2014 at 01:44:02PM -0800, Davidlohr Bueso wrote: On Mon, 2014-01-27 at 16:02 -0500, Naoya Horiguchi wrote: On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: From: Joonsoo Kim iamjoonsoo@lge.com There is a race condition if we map a same file on different processes. Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only the, mmap_sem (exclusively). This doesn't prevent other tasks from modifying the region structure, so it can be modified by two processes concurrently. To solve this, introduce a spinlock to resv_map and make region manipulation function grab it before they do actual work. Acked-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com [Updated changelog] Signed-off-by: Davidlohr Bueso davidl...@hp.com --- ... @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long f, long t) * Subtle, allocate a new region at the position but make it zero * size such that we can guarantee to record the reservation. */ if (rg-link == head || t rg-from) { - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); - if (!nrg) - return -ENOMEM; + if (!nrg) { + spin_unlock(resv-lock); I think that doing kmalloc() inside the lock is simpler. Why do you unlock and retry here? This is a spinlock, no can do -- we've previously debated this and since the critical region is quite small, a non blocking lock is better suited here. We do the retry so we don't race once the new region is allocated after the lock is dropped. Using spinlock instead of rw_sem makes sense. But I'm not sure how the retry is essential to fix the race. (Sorry I can't find the discussion log about this.) As you did in your ver.1 (https://lkml.org/lkml/2013/7/26/296), simply doing like below seems to be fine to me, is it right? if (rg-link == head || t rg-from) { nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); if (!nrg) { chg = -ENOMEM; goto out_locked; } nrg-from = f; ... } That's nice and simple because we were using the rwsem version. In the current version nrg is initialized to NULL, so we always do retry once when adding new file_region. That's not optimal to me. Right, the retry can only occur once. If this retry is really essential for the fix, please comment the reason both in patch description and inline comment. It's very important for future code maintenance. So we locate the corresponding region in the reserve map, and if we are below the current region, then we allocate a new one. Since we dropped the lock to allocate memory, we have to make sure that we still need the new region and that we don't race with the new status of the reservation map. This is the whole point of the retry, and I don't see it being suboptimal. I'm afraid that you don't explain why you need drop the lock for memory allocation. Are you saying that this unlocking comes from the difference between rwsem and spin lock? I think if we call kmalloc() with the lock held we don't have to check that we still need the new region because resv-lock guarantees that no other thread changes the reservation map, right? We just cannot retake the lock after we get the new region and just add it to to the list. And I noticed another point. I don't think the name of new goto label 'out_locked' is a good one. 'out_unlock' or 'unlock' is better. What worries me more is that we're actually freeing a valid new region (nrg) upon exit. We certainly don't do so in the current code, and it doesn't seem to be a leak. Instead, we should be doing: You're right. There is another goto in region_chg() where we never do the kmalloc, so calling kfree is a bug. Thanks, Naoya Horiguchi if (rg-link == head || t rg-from) { if (!nrg) { spin_unlock(resv-lock); nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); if (!nrg) return -ENOMEM; goto retry; } nrg-from = f; nrg-to = f; INIT_LIST_HEAD(nrg-link); list_add(nrg-link, rg-link.prev); chg = t - f; goto out; } ... out: spin_unlock(resv-lock); return chg; Thanks, Davidlohr -- To
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Tue, 2014-01-28 at 19:36 -0500, Naoya Horiguchi wrote: On Mon, Jan 27, 2014 at 06:34:17PM -0800, Davidlohr Bueso wrote: [...] If this retry is really essential for the fix, please comment the reason both in patch description and inline comment. It's very important for future code maintenance. So we locate the corresponding region in the reserve map, and if we are below the current region, then we allocate a new one. Since we dropped the lock to allocate memory, we have to make sure that we still need the new region and that we don't race with the new status of the reservation map. This is the whole point of the retry, and I don't see it being suboptimal. I'm afraid that you don't explain why you need drop the lock for memory allocation. Are you saying that this unlocking comes from the difference between rwsem and spin lock? Because you cannot go to sleep while holding a spinlock, which is exactly what kmalloc(GFP_KERNEL) can do. We *might* get a way with it with GFP_ATOMIC, I dunno, but I certainly prefer this approach better. Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Mon, 2014-01-27 at 20:53 -0500, Naoya Horiguchi wrote: > Hi Davidlohr, > > On Mon, Jan 27, 2014 at 01:44:02PM -0800, Davidlohr Bueso wrote: > > On Mon, 2014-01-27 at 16:02 -0500, Naoya Horiguchi wrote: > > > On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: > > > > From: Joonsoo Kim > > > > > > > > There is a race condition if we map a same file on different processes. > > > > Region tracking is protected by mmap_sem and > > > > hugetlb_instantiation_mutex. > > > > When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only > > > > the, > > > > mmap_sem (exclusively). This doesn't prevent other tasks from modifying > > > > the > > > > region structure, so it can be modified by two processes concurrently. > > > > > > > > To solve this, introduce a spinlock to resv_map and make region > > > > manipulation > > > > function grab it before they do actual work. > > > > > > > > Acked-by: David Gibson > > > > Signed-off-by: Joonsoo Kim > > > > [Updated changelog] > > > > Signed-off-by: Davidlohr Bueso > > > > --- > > > ... > > > > @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, > > > > long f, long t) > > > > * Subtle, allocate a new region at the position but make it > > > > zero > > > > * size such that we can guarantee to record the reservation. */ > > > > if (>link == head || t < rg->from) { > > > > - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); > > > > - if (!nrg) > > > > - return -ENOMEM; > > > > + if (!nrg) { > > > > + spin_unlock(>lock); > > > > > > I think that doing kmalloc() inside the lock is simpler. > > > Why do you unlock and retry here? > > > > This is a spinlock, no can do -- we've previously debated this and since > > the critical region is quite small, a non blocking lock is better suited > > here. We do the retry so we don't race once the new region is allocated > > after the lock is dropped. > > Using spinlock instead of rw_sem makes sense. > But I'm not sure how the retry is essential to fix the race. > (Sorry I can't find the discussion log about this.) > As you did in your ver.1 (https://lkml.org/lkml/2013/7/26/296), > simply doing like below seems to be fine to me, is it right? > > if (>link == head || t < rg->from) { > nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); > if (!nrg) { > chg = -ENOMEM; > goto out_locked; > } > nrg->from = f; > ... > } That's nice and simple because we were using the rwsem version. > > In the current version nrg is initialized to NULL, so we always do retry > once when adding new file_region. That's not optimal to me. Right, the retry can only occur once. > > If this retry is really essential for the fix, please comment the reason > both in patch description and inline comment. It's very important for > future code maintenance. So we locate the corresponding region in the reserve map, and if we are below the current region, then we allocate a new one. Since we dropped the lock to allocate memory, we have to make sure that we still need the new region and that we don't race with the new status of the reservation map. This is the whole point of the retry, and I don't see it being suboptimal. We just cannot retake the lock after we get the new region and just add it to to the list. > > And I noticed another point. I don't think the name of new goto label > 'out_locked' is a good one. 'out_unlock' or 'unlock' is better. What worries me more is that we're actually freeing a valid new region (nrg) upon exit. We certainly don't do so in the current code, and it doesn't seem to be a leak. Instead, we should be doing: if (>link == head || t < rg->from) { if (!nrg) { spin_unlock(>lock); nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); if (!nrg) return -ENOMEM; goto retry; } nrg->from = f; nrg->to = f; INIT_LIST_HEAD(>link); list_add(>link, rg->link.prev); chg = t - f; goto out; } ... out: spin_unlock(>lock); return chg; Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
Hi Davidlohr, On Mon, Jan 27, 2014 at 01:44:02PM -0800, Davidlohr Bueso wrote: > On Mon, 2014-01-27 at 16:02 -0500, Naoya Horiguchi wrote: > > On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: > > > From: Joonsoo Kim > > > > > > There is a race condition if we map a same file on different processes. > > > Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. > > > When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only > > > the, > > > mmap_sem (exclusively). This doesn't prevent other tasks from modifying > > > the > > > region structure, so it can be modified by two processes concurrently. > > > > > > To solve this, introduce a spinlock to resv_map and make region > > > manipulation > > > function grab it before they do actual work. > > > > > > Acked-by: David Gibson > > > Signed-off-by: Joonsoo Kim > > > [Updated changelog] > > > Signed-off-by: Davidlohr Bueso > > > --- > > ... > > > @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long > > > f, long t) > > >* Subtle, allocate a new region at the position but make it zero > > >* size such that we can guarantee to record the reservation. */ > > > if (>link == head || t < rg->from) { > > > - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); > > > - if (!nrg) > > > - return -ENOMEM; > > > + if (!nrg) { > > > + spin_unlock(>lock); > > > > I think that doing kmalloc() inside the lock is simpler. > > Why do you unlock and retry here? > > This is a spinlock, no can do -- we've previously debated this and since > the critical region is quite small, a non blocking lock is better suited > here. We do the retry so we don't race once the new region is allocated > after the lock is dropped. Using spinlock instead of rw_sem makes sense. But I'm not sure how the retry is essential to fix the race. (Sorry I can't find the discussion log about this.) As you did in your ver.1 (https://lkml.org/lkml/2013/7/26/296), simply doing like below seems to be fine to me, is it right? if (>link == head || t < rg->from) { nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); if (!nrg) { chg = -ENOMEM; goto out_locked; } nrg->from = f; ... } In the current version nrg is initialized to NULL, so we always do retry once when adding new file_region. That's not optimal to me. If this retry is really essential for the fix, please comment the reason both in patch description and inline comment. It's very important for future code maintenance. And I noticed another point. I don't think the name of new goto label 'out_locked' is a good one. 'out_unlock' or 'unlock' is better. Thanks, Naoya Horiguchi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Mon, 2014-01-27 at 16:02 -0500, Naoya Horiguchi wrote: > On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: > > From: Joonsoo Kim > > > > There is a race condition if we map a same file on different processes. > > Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. > > When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only the, > > mmap_sem (exclusively). This doesn't prevent other tasks from modifying the > > region structure, so it can be modified by two processes concurrently. > > > > To solve this, introduce a spinlock to resv_map and make region manipulation > > function grab it before they do actual work. > > > > Acked-by: David Gibson > > Signed-off-by: Joonsoo Kim > > [Updated changelog] > > Signed-off-by: Davidlohr Bueso > > --- > ... > > @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long f, > > long t) > > * Subtle, allocate a new region at the position but make it zero > > * size such that we can guarantee to record the reservation. */ > > if (>link == head || t < rg->from) { > > - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); > > - if (!nrg) > > - return -ENOMEM; > > + if (!nrg) { > > + spin_unlock(>lock); > > I think that doing kmalloc() inside the lock is simpler. > Why do you unlock and retry here? This is a spinlock, no can do -- we've previously debated this and since the critical region is quite small, a non blocking lock is better suited here. We do the retry so we don't race once the new region is allocated after the lock is dropped. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: > From: Joonsoo Kim > > There is a race condition if we map a same file on different processes. > Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. > When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only the, > mmap_sem (exclusively). This doesn't prevent other tasks from modifying the > region structure, so it can be modified by two processes concurrently. > > To solve this, introduce a spinlock to resv_map and make region manipulation > function grab it before they do actual work. > > Acked-by: David Gibson > Signed-off-by: Joonsoo Kim > [Updated changelog] > Signed-off-by: Davidlohr Bueso > --- ... > @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long f, > long t) >* Subtle, allocate a new region at the position but make it zero >* size such that we can guarantee to record the reservation. */ > if (>link == head || t < rg->from) { > - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); > - if (!nrg) > - return -ENOMEM; > + if (!nrg) { > + spin_unlock(>lock); I think that doing kmalloc() inside the lock is simpler. Why do you unlock and retry here? Thanks, Naoya Horiguchi > + nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); > + if (!nrg) > + return -ENOMEM; > + > + goto retry; > + } > + > nrg->from = f; > nrg->to = f; > INIT_LIST_HEAD(>link); > list_add(>link, rg->link.prev); > + nrg = NULL; > > - return t - f; > + chg = t - f; > + goto out_locked; > } > > /* Round our left edge to the current segment if it encloses us. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: From: Joonsoo Kim iamjoonsoo@lge.com There is a race condition if we map a same file on different processes. Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only the, mmap_sem (exclusively). This doesn't prevent other tasks from modifying the region structure, so it can be modified by two processes concurrently. To solve this, introduce a spinlock to resv_map and make region manipulation function grab it before they do actual work. Acked-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com [Updated changelog] Signed-off-by: Davidlohr Bueso davidl...@hp.com --- ... @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long f, long t) * Subtle, allocate a new region at the position but make it zero * size such that we can guarantee to record the reservation. */ if (rg-link == head || t rg-from) { - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); - if (!nrg) - return -ENOMEM; + if (!nrg) { + spin_unlock(resv-lock); I think that doing kmalloc() inside the lock is simpler. Why do you unlock and retry here? Thanks, Naoya Horiguchi + nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); + if (!nrg) + return -ENOMEM; + + goto retry; + } + nrg-from = f; nrg-to = f; INIT_LIST_HEAD(nrg-link); list_add(nrg-link, rg-link.prev); + nrg = NULL; - return t - f; + chg = t - f; + goto out_locked; } /* Round our left edge to the current segment if it encloses us. */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Mon, 2014-01-27 at 16:02 -0500, Naoya Horiguchi wrote: On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: From: Joonsoo Kim iamjoonsoo@lge.com There is a race condition if we map a same file on different processes. Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only the, mmap_sem (exclusively). This doesn't prevent other tasks from modifying the region structure, so it can be modified by two processes concurrently. To solve this, introduce a spinlock to resv_map and make region manipulation function grab it before they do actual work. Acked-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com [Updated changelog] Signed-off-by: Davidlohr Bueso davidl...@hp.com --- ... @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long f, long t) * Subtle, allocate a new region at the position but make it zero * size such that we can guarantee to record the reservation. */ if (rg-link == head || t rg-from) { - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); - if (!nrg) - return -ENOMEM; + if (!nrg) { + spin_unlock(resv-lock); I think that doing kmalloc() inside the lock is simpler. Why do you unlock and retry here? This is a spinlock, no can do -- we've previously debated this and since the critical region is quite small, a non blocking lock is better suited here. We do the retry so we don't race once the new region is allocated after the lock is dropped. Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
Hi Davidlohr, On Mon, Jan 27, 2014 at 01:44:02PM -0800, Davidlohr Bueso wrote: On Mon, 2014-01-27 at 16:02 -0500, Naoya Horiguchi wrote: On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: From: Joonsoo Kim iamjoonsoo@lge.com There is a race condition if we map a same file on different processes. Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only the, mmap_sem (exclusively). This doesn't prevent other tasks from modifying the region structure, so it can be modified by two processes concurrently. To solve this, introduce a spinlock to resv_map and make region manipulation function grab it before they do actual work. Acked-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com [Updated changelog] Signed-off-by: Davidlohr Bueso davidl...@hp.com --- ... @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long f, long t) * Subtle, allocate a new region at the position but make it zero * size such that we can guarantee to record the reservation. */ if (rg-link == head || t rg-from) { - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); - if (!nrg) - return -ENOMEM; + if (!nrg) { + spin_unlock(resv-lock); I think that doing kmalloc() inside the lock is simpler. Why do you unlock and retry here? This is a spinlock, no can do -- we've previously debated this and since the critical region is quite small, a non blocking lock is better suited here. We do the retry so we don't race once the new region is allocated after the lock is dropped. Using spinlock instead of rw_sem makes sense. But I'm not sure how the retry is essential to fix the race. (Sorry I can't find the discussion log about this.) As you did in your ver.1 (https://lkml.org/lkml/2013/7/26/296), simply doing like below seems to be fine to me, is it right? if (rg-link == head || t rg-from) { nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); if (!nrg) { chg = -ENOMEM; goto out_locked; } nrg-from = f; ... } In the current version nrg is initialized to NULL, so we always do retry once when adding new file_region. That's not optimal to me. If this retry is really essential for the fix, please comment the reason both in patch description and inline comment. It's very important for future code maintenance. And I noticed another point. I don't think the name of new goto label 'out_locked' is a good one. 'out_unlock' or 'unlock' is better. Thanks, Naoya Horiguchi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] mm, hugetlb: fix race in region tracking
On Mon, 2014-01-27 at 20:53 -0500, Naoya Horiguchi wrote: Hi Davidlohr, On Mon, Jan 27, 2014 at 01:44:02PM -0800, Davidlohr Bueso wrote: On Mon, 2014-01-27 at 16:02 -0500, Naoya Horiguchi wrote: On Sun, Jan 26, 2014 at 07:52:21PM -0800, Davidlohr Bueso wrote: From: Joonsoo Kim iamjoonsoo@lge.com There is a race condition if we map a same file on different processes. Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only the, mmap_sem (exclusively). This doesn't prevent other tasks from modifying the region structure, so it can be modified by two processes concurrently. To solve this, introduce a spinlock to resv_map and make region manipulation function grab it before they do actual work. Acked-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com [Updated changelog] Signed-off-by: Davidlohr Bueso davidl...@hp.com --- ... @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long f, long t) * Subtle, allocate a new region at the position but make it zero * size such that we can guarantee to record the reservation. */ if (rg-link == head || t rg-from) { - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); - if (!nrg) - return -ENOMEM; + if (!nrg) { + spin_unlock(resv-lock); I think that doing kmalloc() inside the lock is simpler. Why do you unlock and retry here? This is a spinlock, no can do -- we've previously debated this and since the critical region is quite small, a non blocking lock is better suited here. We do the retry so we don't race once the new region is allocated after the lock is dropped. Using spinlock instead of rw_sem makes sense. But I'm not sure how the retry is essential to fix the race. (Sorry I can't find the discussion log about this.) As you did in your ver.1 (https://lkml.org/lkml/2013/7/26/296), simply doing like below seems to be fine to me, is it right? if (rg-link == head || t rg-from) { nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); if (!nrg) { chg = -ENOMEM; goto out_locked; } nrg-from = f; ... } That's nice and simple because we were using the rwsem version. In the current version nrg is initialized to NULL, so we always do retry once when adding new file_region. That's not optimal to me. Right, the retry can only occur once. If this retry is really essential for the fix, please comment the reason both in patch description and inline comment. It's very important for future code maintenance. So we locate the corresponding region in the reserve map, and if we are below the current region, then we allocate a new one. Since we dropped the lock to allocate memory, we have to make sure that we still need the new region and that we don't race with the new status of the reservation map. This is the whole point of the retry, and I don't see it being suboptimal. We just cannot retake the lock after we get the new region and just add it to to the list. And I noticed another point. I don't think the name of new goto label 'out_locked' is a good one. 'out_unlock' or 'unlock' is better. What worries me more is that we're actually freeing a valid new region (nrg) upon exit. We certainly don't do so in the current code, and it doesn't seem to be a leak. Instead, we should be doing: if (rg-link == head || t rg-from) { if (!nrg) { spin_unlock(resv-lock); nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); if (!nrg) return -ENOMEM; goto retry; } nrg-from = f; nrg-to = f; INIT_LIST_HEAD(nrg-link); list_add(nrg-link, rg-link.prev); chg = t - f; goto out; } ... out: spin_unlock(resv-lock); return chg; Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/8] mm, hugetlb: fix race in region tracking
From: Joonsoo Kim There is a race condition if we map a same file on different processes. Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only the, mmap_sem (exclusively). This doesn't prevent other tasks from modifying the region structure, so it can be modified by two processes concurrently. To solve this, introduce a spinlock to resv_map and make region manipulation function grab it before they do actual work. Acked-by: David Gibson Signed-off-by: Joonsoo Kim [Updated changelog] Signed-off-by: Davidlohr Bueso --- include/linux/hugetlb.h | 1 + mm/hugetlb.c| 48 2 files changed, 33 insertions(+), 16 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 29c9371..db556f3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -26,6 +26,7 @@ struct hugepage_subpool { struct resv_map { struct kref refs; + spinlock_t lock; struct list_head regions; }; extern struct resv_map *resv_map_alloc(void); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 572866d..6b40d7e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -135,15 +135,8 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) * Region tracking -- allows tracking of reservations and instantiated pages *across the pages in a mapping. * - * The region data structures are protected by a combination of the mmap_sem - * and the hugetlb_instantiation_mutex. To access or modify a region the caller - * must either hold the mmap_sem for write, or the mmap_sem for read and - * the hugetlb_instantiation_mutex: - * - * down_write(>mmap_sem); - * or - * down_read(>mmap_sem); - * mutex_lock(_instantiation_mutex); + * The region data structures are embedded into a resv_map and + * protected by a resv_map's lock */ struct file_region { struct list_head link; @@ -156,6 +149,7 @@ static long region_add(struct resv_map *resv, long f, long t) struct list_head *head = >regions; struct file_region *rg, *nrg, *trg; + spin_lock(>lock); /* Locate the region we are either in or before. */ list_for_each_entry(rg, head, link) if (f <= rg->to) @@ -185,15 +179,18 @@ static long region_add(struct resv_map *resv, long f, long t) } nrg->from = f; nrg->to = t; + spin_unlock(>lock); return 0; } static long region_chg(struct resv_map *resv, long f, long t) { struct list_head *head = >regions; - struct file_region *rg, *nrg; + struct file_region *rg, *nrg = NULL; long chg = 0; +retry: + spin_lock(>lock); /* Locate the region we are before or in. */ list_for_each_entry(rg, head, link) if (f <= rg->to) @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long f, long t) * Subtle, allocate a new region at the position but make it zero * size such that we can guarantee to record the reservation. */ if (>link == head || t < rg->from) { - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); - if (!nrg) - return -ENOMEM; + if (!nrg) { + spin_unlock(>lock); + nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); + if (!nrg) + return -ENOMEM; + + goto retry; + } + nrg->from = f; nrg->to = f; INIT_LIST_HEAD(>link); list_add(>link, rg->link.prev); + nrg = NULL; - return t - f; + chg = t - f; + goto out_locked; } /* Round our left edge to the current segment if it encloses us. */ @@ -224,7 +229,7 @@ static long region_chg(struct resv_map *resv, long f, long t) if (>link == head) break; if (rg->from > t) - return chg; + goto out_locked; /* We overlap with this area, if it extends further than * us then we must extend ourselves. Account for its @@ -235,6 +240,10 @@ static long region_chg(struct resv_map *resv, long f, long t) } chg -= rg->to - rg->from; } + +out_locked: + spin_unlock(>lock); + kfree(nrg); return chg; } @@ -244,12 +253,13 @@ static long region_truncate(struct resv_map *resv, long end) struct file_region *rg, *trg; long chg = 0; + spin_lock(>lock); /* Locate the region we are either in or before. */ list_for_each_entry(rg, head, link) if (end <= rg->to) break; if (>link == head) -
[PATCH 3/8] mm, hugetlb: fix race in region tracking
From: Joonsoo Kim iamjoonsoo@lge.com There is a race condition if we map a same file on different processes. Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex. When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only the, mmap_sem (exclusively). This doesn't prevent other tasks from modifying the region structure, so it can be modified by two processes concurrently. To solve this, introduce a spinlock to resv_map and make region manipulation function grab it before they do actual work. Acked-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com [Updated changelog] Signed-off-by: Davidlohr Bueso davidl...@hp.com --- include/linux/hugetlb.h | 1 + mm/hugetlb.c| 48 2 files changed, 33 insertions(+), 16 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 29c9371..db556f3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -26,6 +26,7 @@ struct hugepage_subpool { struct resv_map { struct kref refs; + spinlock_t lock; struct list_head regions; }; extern struct resv_map *resv_map_alloc(void); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 572866d..6b40d7e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -135,15 +135,8 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) * Region tracking -- allows tracking of reservations and instantiated pages *across the pages in a mapping. * - * The region data structures are protected by a combination of the mmap_sem - * and the hugetlb_instantiation_mutex. To access or modify a region the caller - * must either hold the mmap_sem for write, or the mmap_sem for read and - * the hugetlb_instantiation_mutex: - * - * down_write(mm-mmap_sem); - * or - * down_read(mm-mmap_sem); - * mutex_lock(hugetlb_instantiation_mutex); + * The region data structures are embedded into a resv_map and + * protected by a resv_map's lock */ struct file_region { struct list_head link; @@ -156,6 +149,7 @@ static long region_add(struct resv_map *resv, long f, long t) struct list_head *head = resv-regions; struct file_region *rg, *nrg, *trg; + spin_lock(resv-lock); /* Locate the region we are either in or before. */ list_for_each_entry(rg, head, link) if (f = rg-to) @@ -185,15 +179,18 @@ static long region_add(struct resv_map *resv, long f, long t) } nrg-from = f; nrg-to = t; + spin_unlock(resv-lock); return 0; } static long region_chg(struct resv_map *resv, long f, long t) { struct list_head *head = resv-regions; - struct file_region *rg, *nrg; + struct file_region *rg, *nrg = NULL; long chg = 0; +retry: + spin_lock(resv-lock); /* Locate the region we are before or in. */ list_for_each_entry(rg, head, link) if (f = rg-to) @@ -203,15 +200,23 @@ static long region_chg(struct resv_map *resv, long f, long t) * Subtle, allocate a new region at the position but make it zero * size such that we can guarantee to record the reservation. */ if (rg-link == head || t rg-from) { - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); - if (!nrg) - return -ENOMEM; + if (!nrg) { + spin_unlock(resv-lock); + nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); + if (!nrg) + return -ENOMEM; + + goto retry; + } + nrg-from = f; nrg-to = f; INIT_LIST_HEAD(nrg-link); list_add(nrg-link, rg-link.prev); + nrg = NULL; - return t - f; + chg = t - f; + goto out_locked; } /* Round our left edge to the current segment if it encloses us. */ @@ -224,7 +229,7 @@ static long region_chg(struct resv_map *resv, long f, long t) if (rg-link == head) break; if (rg-from t) - return chg; + goto out_locked; /* We overlap with this area, if it extends further than * us then we must extend ourselves. Account for its @@ -235,6 +240,10 @@ static long region_chg(struct resv_map *resv, long f, long t) } chg -= rg-to - rg-from; } + +out_locked: + spin_unlock(resv-lock); + kfree(nrg); return chg; } @@ -244,12 +253,13 @@ static long region_truncate(struct resv_map *resv, long end) struct file_region *rg, *trg; long chg = 0; + spin_lock(resv-lock); /* Locate the region we are either in or before. */