Re: Linux 3.15 .. and continuation of merge window

2014-06-11 Thread J. R. Okajima

Al Viro:
> The former guarantees that the address we are doing trylock on would be that
> of a live dentry.  The latter makes sure that anything assigned to 
> dentry->d_parent after we drop ->d_lock will not be freed until we drop
> rcu_read_lock.

Although I don't know how to reproduce the problem for the latter case,
I think the patch is good.


J. R. Okajima
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.15 .. and continuation of merge window

2014-06-11 Thread Al Viro
On Wed, Jun 11, 2014 at 06:32:58PM +0900, J. R. Okajima wrote:
> 
> Al Viro:
> > So I suspect that the right fix is a bit trickier - in addition to check
> > on the fast path (i.e. when trylock gets us the lock on parent), we need
> > to
> > * get rcu_read_lock() before dropping ->d_lock.
> > * check if dentry is already doomed right after taking rcu_read_lock();
> > if not, any value we might see in ->d_parent afterwards will point to object
> > not freed until we drop rcu_read_lock.
> >
> > IOW, something like the delta below.  Comments?
> 
> I will try testing later.
> For now, as a comment before testing, the patch looks weird for me. It
> checks d_lockref.count twice during d_lockref.lock held. It must be the
> same result, isn't it?

Right you are.  So what we need is
* check that thing once, as in your variant (I'd still prefer to
check ->d_lockref.count instead of ->d_flags, but it's the same thing being
tested)
* ... and get rcu_read_lock() *before* dropping ->d_lock.

The former guarantees that the address we are doing trylock on would be that
of a live dentry.  The latter makes sure that anything assigned to 
dentry->d_parent after we drop ->d_lock will not be freed until we drop
rcu_read_lock.

Signed-off-by: Al Viro 
---
diff --git a/fs/dcache.c b/fs/dcache.c
index be2bea8..e99c6f5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -532,10 +532,12 @@ static inline struct dentry *lock_parent(struct dentry 
*dentry)
struct dentry *parent = dentry->d_parent;
if (IS_ROOT(dentry))
return NULL;
+   if (unlikely((int)dentry->d_lockref.count < 0))
+   return NULL;
if (likely(spin_trylock(&parent->d_lock)))
return parent;
-   spin_unlock(&dentry->d_lock);
rcu_read_lock();
+   spin_unlock(&dentry->d_lock);
 again:
parent = ACCESS_ONCE(dentry->d_parent);
spin_lock(&parent->d_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.15 .. and continuation of merge window

2014-06-11 Thread J. R. Okajima

Al Viro:
> So I suspect that the right fix is a bit trickier - in addition to check
> on the fast path (i.e. when trylock gets us the lock on parent), we need
> to
>   * get rcu_read_lock() before dropping ->d_lock.
>   * check if dentry is already doomed right after taking rcu_read_lock();
> if not, any value we might see in ->d_parent afterwards will point to object
> not freed until we drop rcu_read_lock.
>
> IOW, something like the delta below.  Comments?

I will try testing later.
For now, as a comment before testing, the patch looks weird for me. It
checks d_lockref.count twice during d_lockref.lock held. It must be the
same result, isn't it? Or does it mean that denty can be handled by
lockref_mark_dead() even if d_lockref.lock is held?


J. R. Okajima
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.15 .. and continuation of merge window

2014-06-10 Thread Al Viro
On Mon, Jun 09, 2014 at 12:30:34PM +0900, J. R. Okajima wrote:
> 
> Linus Torvalds:
> > So I ended up doing an rc8 because I was a bit worried about some
> > last-minute dcache fixes, but it turns out that nobody seemed to even
> > notice those. We did have other issues during the week, though, so it
>   :::
> 
> I am afraid there is a problem in dcache. Please read
> http://marc.info/?l=linux-fsdevel&m=140214911608925&w=2

There is a problem, all right, but your fix doesn't really fix it - just
narrows the race window ;-/  I would really like to detach the bugger
as soon as __dentry_kill() removes it from the list of children, but
unfortunately NFS wants it to be still valid in ->d_iput() (BTW,
nfs_can_unlink() is doing something very odd -
parent = dget_parent(dentry);
if (parent == NULL)
goto out_free;
is pointless, since dget_parent() never returns NULL; what was that
check trying to accomplish?)

Your scenario isn't feasible as described, but something similar can
happen with *two* shrinkers racing - dirB might've ended up on one list,
with fileC looked up a bit later, ending up on another list right after
that.  So the problem is real; unfortunately, DCACHE_DENTRY_KILLED might
have appeared right after we'd dropped ->d_lock.

So I suspect that the right fix is a bit trickier - in addition to check
on the fast path (i.e. when trylock gets us the lock on parent), we need
to
* get rcu_read_lock() before dropping ->d_lock.
* check if dentry is already doomed right after taking rcu_read_lock();
if not, any value we might see in ->d_parent afterwards will point to object
not freed until we drop rcu_read_lock.

IOW, something like the delta below.  Comments?

PS: apologies for being MIA; caught some crap, spent the last week being
very unhappy ;-/  I'll send a pull request tomorrow morning.

diff --git a/fs/dcache.c b/fs/dcache.c
index be2bea8..65ec10f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -532,10 +532,16 @@ static inline struct dentry *lock_parent(struct dentry 
*dentry)
struct dentry *parent = dentry->d_parent;
if (IS_ROOT(dentry))
return NULL;
+   if (unlikely((int)dentry->d_lockref.count < 0))
+   return NULL;
if (likely(spin_trylock(&parent->d_lock)))
return parent;
-   spin_unlock(&dentry->d_lock);
rcu_read_lock();
+   if (unlikely((int)dentry->d_lockref.count < 0)) {
+   rcu_read_unlock();
+   return NULL;
+   }
+   spin_unlock(&dentry->d_lock);
 again:
parent = ACCESS_ONCE(dentry->d_parent);
spin_lock(&parent->d_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.15 .. and continuation of merge window

2014-06-08 Thread J. R. Okajima

Linus Torvalds:
> So I ended up doing an rc8 because I was a bit worried about some
> last-minute dcache fixes, but it turns out that nobody seemed to even
> notice those. We did have other issues during the week, though, so it
:::

I am afraid there is a problem in dcache. Please read
http://marc.info/?l=linux-fsdevel&m=140214911608925&w=2


For 3.16, please consider these patches.

- for vfs
  [PATCH v2] vfs: get_next_ino(), never inum=0
  http://marc.info/?l=linux-fsdevel&m=140128600801771&w=2

- for tmpfs
  [RFC PATCH v4 0/2] tmpfs: manage the inode-number by IDR (performance measure)
  http://marc.info/?l=linux-fsdevel&m=140197128021025&w=2
  [RFC PATCH v4 1/2] tmpfs: manage the inode-number by IDR, signed int inum
  http://marc.info/?l=linux-fsdevel&m=140197128321029&w=2
  [RFC PATCH v4 2/2] tmpfs: refine a file handle for NFS-exporting
  http://marc.info/?l=linux-fsdevel&m=140197128121026&w=2


J. R. Okajima
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 3.15 .. and continuation of merge window

2014-06-08 Thread Linus Torvalds
So I ended up doing an rc8 because I was a bit worried about some
last-minute dcache fixes, but it turns out that nobody seemed to even
notice those. We did have other issues during the week, though, so it
was just as well. The futex fixes and cleanups may stand out, but as
usual there's various other random fixes since rc8 in there too:
mainly drivers (drm, networking, sound, usb etc), networking,
scheduling and perf tooling.

But it's all been fairly small and quiet, which *may* of course be due
to the fact that last week was also the first week of the merge window
for 3.16. That might have distracted some developers. I'm not entirely
convinced I liked the overlap, but it seemed to work ok, and unless
people scream really loudly ("Please don't _ever_ do that again") and
give good reasons for doing so, I might end up doing that overlapping
merge window in the future too if it ends up helping out with some
particular timing issue.

That said, I also don't think it was such a wonderful experience that
I'd want to necessarily do the overlap every time, without a good
specific reason for doing so. It was kind of nice being productive
during the last week or rc (which is usually quite boring and dead),
but I think it might be a distraction when people should be worrying
about the stability of the rc.

Of course, maybe the overlap ends up meaning that we get less noise
during the last week of stabilization, and it actually helps. It could
go either way. I'd be interested to hear what people thought, although
I _suspect_ most people don't feel strongly either way.

Anyway, with 3.15 released, my "master" branch has already merged the
work in my "next" branch on my local machine, and I'll be
decommissioning the "next" branch once I push that all out.  After
that, any future merge window work will happen on "master", and we'll
be back to the normal single-branch model for my tree.

 Linus

---

Alan Stern (1):
  USB: Avoid runtime suspend loops for HCDs that can't handle suspend/resume

Aleksander Morgado (3):
  net: qmi_wwan: add Netgear AirCard 341U
  net: qmi_wwan: add additional Sierra Wireless QMI devices
  net: qmi_wwan: interface #11 in Sierra Wireless MC73xx is not QMI

Alex Deucher (1):
  drm/radeon/dpm: resume fixes for some systems

Alexej Starschenko (1):
  USB: serial: option: add support for Novatel E371 PCIe card

Andrey Ryabinin (1):
  mm: rmap: fix use-after-free in __put_anon_vma

Andy Lutomirski (1):
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET

Bart De Schuymer (1):
  ebtables: Update MAINTAINERS entry.

Ben Hutchings (2):
  Staging: speakup: Move pasting into a work item
  Staging: speakup: Update __speakup_paste_selection() tty
(ab)usage to match vt

Bjørn Mork (1):
  usb: cdc-wdm: export cdc-wdm uapi header

Christian König (3):
  drm/radeon: fix vm buffer size estimation
  drm/radeon: sync page table updates
  drm/radeon: use the CP DMA on CIK

Dan Carpenter (1):
  qlcnic: info leak in qlcnic_dcb_peer_app_info()

Dave Young (2):
  x86/efi: earlyprintk=efi,keep fix
  x86/efi: Do not export efi runtime map in case old map

Dirk Brandewie (3):
  intel_pstate: Remove C0 tracking
  intel_pstate: Correct rounding in busy calculation
  intel_pstate: add sample time scaling

Doug Smythies (1):
  intel_pstate: Improve initial busy calculation

Emmanuel Grumbach (1):
  iwlwifi: mvm: disable beacon filtering

Eric Dumazet (1):
  net: fix inet_getid() and ipv6_select_ident() bugs

Eric W. Biederman (1):
  netlink: Only check file credentials for implicit destinations

Filipe Manana (1):
  Btrfs: send, fix corrupted path strings for long paths

George McCollister (1):
  USB: ftdi_sio: add NovaTech OrionLXm product ID

Greg Kroah-Hartman (1):
  USB: cdc-wdm: properly include types.h

Ian Abbott (1):
  staging: comedi: ni_daq_700: add mux settling delay

Igor Mammedov (3):
  x86: Fix list/memory corruption on CPU hotplug
  x86/smpboot: Log error on secondary CPU wakeup failure at ERR level
  x86/smpboot: Initialize secondary CPU only if master CPU will wait for it

Ivan Mikhaylov (2):
  emac: add missing support of 10mbit in emac/rgmii
  emac: aggregation of v1-2 PLB errors for IER register

Jack Morgenstein (1):
  net/mlx4_core: Reset RoCE VF gids when guest driver goes down

Jean Delvare (1):
  net: ec_bhf: Add runtime dependencies

Jianyu Zhan (1):
  kernfs: move the last knowledge of sysfs out from kernfs

Jiri Pirko (1):
  team: fix mtu setting

Johan Hovold (1):
  USB: io_ti: fix firmware download on big-endian machines (part 2)

Jon Maxwell (1):
  bridge: notify user space after fdb update

Kirill Tkhai (1):
  sched/dl: Fix race in dl_task_timer()

Kristian Evensen (1):
  ipheth: Add support for iPad 2 and iPad 3

Leon Yu (1):
  net: filter: fix possible memory leak in __sk_prepare_filter()

Lin