Re: freeBSD nullfs together with nfs and silly rename
On Sun, 6 Jun 2010 16:44:43 +0200 Leon Meßner wrote: LM Hi, LM I hope this is not the wrong list to ask. Didn't get any answers on LM -questions. LM When you try to do the following inside a nullfs mounted directory, LM where the nullfs origin is itself mounted via nfs you get an error: LM # foo LM # tail -f foo LM # rm -f foo LM tail: foo: Stale NFS file handle LM # fg LM This is really a problem when running services inside jails and using LM NFS as storage. As of [2] it looks like this problem is known for a LM while. On a normal NFS mount this does not happen as silly renaming LM [1] works there (producing nasty little .nfs files). nfs_sillyrename() is called when vnode's usecount is more then 1. It is expected that unlink() syscall increases vnode's usecount in namei() and if the file has been already opened usecount will be more then 1. But with nullfs layer present the reference counts are held by the upper node, not the lower (nfs) one, so when unlink() is called it increases usecount of the upper vnode, not nfs vnode and nfs_sillyrename() is never called. The strightforward solution looks like to implement null_remove() that will increase lower vnode's refcount before calling null_bypass() and then decrement it after the call. See the attached patch (it works for me on both 8-STABLE and CURRENT). -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: freeBSD nullfs together with nfs and silly rename
On Sat, 12 Jun 2010 11:56:10 +0300 Mikolaj Golub wrote to Leon Meßner: MG See the attached patch (it works for me on both 8-STABLE and CURRENT). Sorry, actually here is the patch. -- Mikolaj Golub Index: sys/fs/nullfs/null_vnops.c === --- sys/fs/nullfs/null_vnops.c (revision 208960) +++ sys/fs/nullfs/null_vnops.c (working copy) @@ -499,6 +499,23 @@ } /* + * Increasing refcount of lower vnode is needed at least for the case + * when lower FS is NFS to do sillyrename if the file is in use. + */ +static int +null_remove(struct vop_remove_args *ap) +{ + int retval; + struct vnode *lvp; + + lvp = NULLVPTOLOWERVP(ap-a_vp); + VREF(lvp); + retval = null_bypass(ap-a_gen); + vrele(lvp); + return (retval); +} + +/* * We handle this to eliminate null FS to lower FS * file moving. Don't know why we don't allow this, * possibly we should. @@ -809,6 +826,7 @@ .vop_open = null_open, .vop_print = null_print, .vop_reclaim = null_reclaim, + .vop_remove = null_remove, .vop_rename = null_rename, .vop_setattr = null_setattr, .vop_strategy = VOP_EOPNOTSUPP, ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: freeBSD nullfs together with nfs and silly rename
On Sat, Jun 12, 2010 at 11:56:10AM +0300, Mikolaj Golub wrote: On Sun, 6 Jun 2010 16:44:43 +0200 Leon Me??ner wrote: LM Hi, LM I hope this is not the wrong list to ask. Didn't get any answers on LM -questions. LM When you try to do the following inside a nullfs mounted directory, LM where the nullfs origin is itself mounted via nfs you get an error: LM # foo LM # tail -f foo LM # rm -f foo LM tail: foo: Stale NFS file handle LM # fg LM This is really a problem when running services inside jails and using LM NFS as storage. As of [2] it looks like this problem is known for a LM while. On a normal NFS mount this does not happen as silly renaming LM [1] works there (producing nasty little .nfs files). nfs_sillyrename() is called when vnode's usecount is more then 1. It is expected that unlink() syscall increases vnode's usecount in namei() and if the file has been already opened usecount will be more then 1. But with nullfs layer present the reference counts are held by the upper node, not the lower (nfs) one, so when unlink() is called it increases usecount of the upper vnode, not nfs vnode and nfs_sillyrename() is never called. The strightforward solution looks like to implement null_remove() that will increase lower vnode's refcount before calling null_bypass() and then decrement it after the call. See the attached patch (it works for me on both 8-STABLE and CURRENT). The upper vnode holds a reference to the lower vnode, as you noted. Now, with your patch, I believe that _all_ calls to the nfs_remove() are happen with refcount 1. pgpFhd1vszN6Q.pgp Description: PGP signature
Re: Re: freeBSD nullfs together with nfs and silly rename
On Sat, 12 Jun 2010, Kostik Belousov wrote: On Sat, Jun 12, 2010 at 11:56:10AM +0300, Mikolaj Golub wrote: On Sun, 6 Jun 2010 16:44:43 +0200 Leon Me??ner wrote: LM Hi, LM I hope this is not the wrong list to ask. Didn't get any answers on LM -questions. LM When you try to do the following inside a nullfs mounted directory, LM where the nullfs origin is itself mounted via nfs you get an error: LM # foo LM # tail -f foo LM # rm -f foo LM tail: foo: Stale NFS file handle LM # fg LM This is really a problem when running services inside jails and using LM NFS as storage. As of [2] it looks like this problem is known for a LM while. On a normal NFS mount this does not happen as silly renaming LM [1] works there (producing nasty little .nfs files). nfs_sillyrename() is called when vnode's usecount is more then 1. It is expected that unlink() syscall increases vnode's usecount in namei() and if the file has been already opened usecount will be more then 1. But with nullfs layer present the reference counts are held by the upper node, not the lower (nfs) one, so when unlink() is called it increases usecount of the upper vnode, not nfs vnode and nfs_sillyrename() is never called. The strightforward solution looks like to implement null_remove() that will increase lower vnode's refcount before calling null_bypass() and then decrement it after the call. See the attached patch (it works for me on both 8-STABLE and CURRENT). The upper vnode holds a reference to the lower vnode, as you noted. Now, with your patch, I believe that _all_ calls to the nfs_remove() are happen with refcount 1. I'm not familiar with the nullfs so this might be way off, but would this patch be ok by any chance? Index: sys/fs/nullfs/null_vnops.c === --- sys/fs/nullfs/null_vnops.c (revision 208960) +++ sys/fs/nullfs/null_vnops.c (working copy) @@ -499,6 +499,23 @@ } /* + * Increasing refcount of lower vnode is needed at least for the case + * when lower FS is NFS to do sillyrename if the file is in use. + */ +static int +null_remove(struct vop_remove_args *ap) +{ + int retval; + struct vnode *lvp; + + if (ap-a_vp-v_usecount 1) { + lvp = NULLVPTOLOWERVP(ap-a_vp); + VREF(lvp); + } else + lvp = NULL; + retval = null_bypass(ap-a_gen); + if (lvp != NULL) + vrele(lvp); + return (retval); +} + +/* * We handle this to eliminate null FS to lower FS * file moving. Don't know why we don't allow this, * possibly we should. @@ -809,6 +826,7 @@ .vop_open = null_open, .vop_print =null_print, .vop_reclaim = null_reclaim, + .vop_remove = null_remove, .vop_rename = null_rename, .vop_setattr = null_setattr, .vop_strategy = VOP_EOPNOTSUPP, ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Re: freeBSD nullfs together with nfs and silly rename
On Sat, Jun 12, 2010 at 11:15:49AM -0400, Rick Macklem wrote: On Sat, 12 Jun 2010, Kostik Belousov wrote: On Sat, Jun 12, 2010 at 11:56:10AM +0300, Mikolaj Golub wrote: On Sun, 6 Jun 2010 16:44:43 +0200 Leon Me??ner wrote: LM Hi, LM I hope this is not the wrong list to ask. Didn't get any answers on LM -questions. LM When you try to do the following inside a nullfs mounted directory, LM where the nullfs origin is itself mounted via nfs you get an error: LM # foo LM # tail -f foo LM # rm -f foo LM tail: foo: Stale NFS file handle LM # fg LM This is really a problem when running services inside jails and using LM NFS as storage. As of [2] it looks like this problem is known for a LM while. On a normal NFS mount this does not happen as silly renaming LM [1] works there (producing nasty little .nfs files). nfs_sillyrename() is called when vnode's usecount is more then 1. It is expected that unlink() syscall increases vnode's usecount in namei() and if the file has been already opened usecount will be more then 1. But with nullfs layer present the reference counts are held by the upper node, not the lower (nfs) one, so when unlink() is called it increases usecount of the upper vnode, not nfs vnode and nfs_sillyrename() is never called. The strightforward solution looks like to implement null_remove() that will increase lower vnode's refcount before calling null_bypass() and then decrement it after the call. See the attached patch (it works for me on both 8-STABLE and CURRENT). The upper vnode holds a reference to the lower vnode, as you noted. Now, with your patch, I believe that _all_ calls to the nfs_remove() are happen with refcount 1. I'm not familiar with the nullfs so this might be way off, but would this patch be ok by any chance? Index: sys/fs/nullfs/null_vnops.c === --- sys/fs/nullfs/null_vnops.c(revision 208960) +++ sys/fs/nullfs/null_vnops.c(working copy) @@ -499,6 +499,23 @@ } /* + * Increasing refcount of lower vnode is needed at least for the case + * when lower FS is NFS to do sillyrename if the file is in use. + */ +static int +null_remove(struct vop_remove_args *ap) +{ + int retval; + struct vnode *lvp; + + if (ap-a_vp-v_usecount 1) { + lvp = NULLVPTOLOWERVP(ap-a_vp); + VREF(lvp); + } else + lvp = NULL; + retval = null_bypass(ap-a_gen); + if (lvp != NULL) + vrele(lvp); + return (retval); +} + +/* Yes, I hoped that Mikolaj ends up with something similar :). Please note that this is racy, since we cannot know why usecount is greater then 1. This might cause the silly rename to kick in some time where it should not, but the race is rare. pgpM94x3NcK0o.pgp Description: PGP signature
Re: Kernel panic when unpluggin AC adaptor
2010/6/11 John Baldwin j...@freebsd.org: On Friday 11 June 2010 6:27:48 am Giovanni Trematerra wrote: On Thu, Jun 10, 2010 at 10:58 PM, Giovanni Trematerra giovanni.tremate...@gmail.com wrote: On Tue, May 4, 2010 at 6:35 PM, David DEMELIER demelier.da...@gmail.com wrote: Good news ! It worked, check the picture here : http://img63.imageshack.us/img63/4244/dsc00361g.jpg Into the file sys/dev/acpica/acpi_cpu.c at the end of acpi_cpu_notify (a per cpu notification handler), called when _CST objects changing, global cpu_cx_count is set to the greatest value of all sc-cpu_cx_count per-cpu variables. That could result in a panic as David reported, because that lets to invoke acpi_cpu_global_cx_lowest_sysctl from /etc/rc.d/power_profile, when AC adapter is unplugged, with a value that not all the CPUs could handle in the acpi_cpu_idle. The patch also change global cpu_cx_lowest according to new value of global cpu_cx_count if needed. David Demelier made a great work to test every patch I sent him to identify the source of the problem. Please, let me know your comments and possibly commit the patch if you think is good enough. As jhb@ pointed me out in private with the previous patch a CPU could never enter in the lowest Cx-state even if it gained. So I'd like to propose this new solution. When hw.acpi.cpu.cx_lowest sysctl is set, the global handler in sys/dev/acpi_cpu.c will set the greatest sc-cpu_cx_lowest value supported by the CPU and not the same value for all CPUs. Later, when CPU, possibly gain new Cx-states, the acpi_cpu_notify handler will set sc-cpu_cx_lowest accordingly with global cx_lowest and the Cx-states supported by the CPU. Now I think that /etc/rc.d/power_profile script has a problem but that is a different story. The script select the lowest_value only querying cx-states of the dev.cpu.0. If different CPUs may have different Cx-states, the script should use as lowest_value the lowest value between all the CPUs. Yes. Please, let me know your comments and possibly commit the patch if you think is good enough. I think this is a good compromise for now. -- John Baldwin Thanks for Giovanni's patience and work, he made a lot of research to solve this little problem :-). Is there a chance that this patch appears in 8.1-RELEASE ? Kind regards. -- Demelier David ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Kernel panic when unpluggin AC adaptor
On Sat, Jun 12, 2010 at 12:19 PM, David DEMELIER demelier.da...@gmail.com wrote: 2010/6/11 John Baldwin j...@freebsd.org: On Friday 11 June 2010 6:27:48 am Giovanni Trematerra wrote: On Thu, Jun 10, 2010 at 10:58 PM, Giovanni Trematerra giovanni.tremate...@gmail.com wrote: On Tue, May 4, 2010 at 6:35 PM, David DEMELIER demelier.da...@gmail.com wrote: Good news ! It worked, check the picture here : http://img63.imageshack.us/img63/4244/dsc00361g.jpg Into the file sys/dev/acpica/acpi_cpu.c at the end of acpi_cpu_notify (a per cpu notification handler), called when _CST objects changing, global cpu_cx_count is set to the greatest value of all sc-cpu_cx_count per-cpu variables. That could result in a panic as David reported, because that lets to invoke acpi_cpu_global_cx_lowest_sysctl from /etc/rc.d/power_profile, when AC adapter is unplugged, with a value that not all the CPUs could handle in the acpi_cpu_idle. The patch also change global cpu_cx_lowest according to new value of global cpu_cx_count if needed. David Demelier made a great work to test every patch I sent him to identify the source of the problem. Please, let me know your comments and possibly commit the patch if you think is good enough. As jhb@ pointed me out in private with the previous patch a CPU could never enter in the lowest Cx-state even if it gained. So I'd like to propose this new solution. When hw.acpi.cpu.cx_lowest sysctl is set, the global handler in sys/dev/acpi_cpu.c will set the greatest sc-cpu_cx_lowest value supported by the CPU and not the same value for all CPUs. Later, when CPU, possibly gain new Cx-states, the acpi_cpu_notify handler will set sc-cpu_cx_lowest accordingly with global cx_lowest and the Cx-states supported by the CPU. Now I think that /etc/rc.d/power_profile script has a problem but that is a different story. The script select the lowest_value only querying cx-states of the dev.cpu.0. If different CPUs may have different Cx-states, the script should use as lowest_value the lowest value between all the CPUs. Yes. Please, let me know your comments and possibly commit the patch if you think is good enough. I think this is a good compromise for now. -- John Baldwin Thanks for Giovanni's patience and work, he made a lot of research to solve this little problem :-). Is there a chance that this patch appears in 8.1-RELEASE ? Kind regards. -- Demelier David Someone would have to discuss with the Release Engineering team, but it would be cool :) -Brandon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Re: Re: freeBSD nullfs together with nfs and silly rename
On Sat, 12 Jun 2010, Kostik Belousov wrote: Yes, I hoped that Mikolaj ends up with something similar :). Please note that this is racy, since we cannot know why usecount is greater then 1. This might cause the silly rename to kick in some time where it should not, but the race is rare. I'd say that having silly rename happen once in a while for unlink when it doesn't have to happen is better than having the file deleted on the server while it is still open on the client. rick ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
nfsv4_server_enable=YES: link_elf: symbol svcpool_destroy undefined
Hello! I'm trying to start the experimental NFSv4 server in RELENG_8 w/o building it into the kernel, as nfsv4(4) suggests: ... or start mountd(8) and nfsd(8) with the ``-e'' option to force use of the experimental server. The nfsuserd(8) daemon must also be running. This will occur if nfs_server_enable=YES nfsv4_server_enable=YES nfsuserd_enable=YES are set in rc.conf(5). However, mountd fails to start nfsd; the same problem exists when doing it by hands: kldload nfsd gives kernel: link_elf: symbol svcpool_destroy undefined error. Can this problem be solved w/o building kernel with options NFSD? -- Sincerely, Dmitry nic-hdl: LYNX-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: nfsv4_server_enable=YES: link_elf: symbol svcpool_destroy undefined
On Sun, 13 Jun 2010, Dmitry Pryanishnikov wrote: Hello! I'm trying to start the experimental NFSv4 server in RELENG_8 w/o building it into the kernel, as nfsv4(4) suggests: ... or start mountd(8) and nfsd(8) with the ``-e'' option to force use of the experimental server. The nfsuserd(8) daemon must also be running. This will occur if nfs_server_enable=YES nfsv4_server_enable=YES nfsuserd_enable=YES are set in rc.conf(5). However, mountd fails to start nfsd; the same problem exists when doing it by hands: kldload nfsd gives kernel: link_elf: symbol svcpool_destroy undefined error. Can this problem be solved w/o building kernel with options NFSD? Well, if you build a kernel with any of the options that cause krpc to be compiled into the kernel, it works. (I usually test with a GENERIC kernel that has NFSCLIENT and NFSSERVER defined in it, so nfsd.ko loads fine.) Basically nfsd is defined as dependent on nfscommon, then nfscommon is defined as dependent on krpc and nfssvc. This gets everthing to load, but when it tries to load nfsd.ko, it can't find the symbols in krpc.ko or nfssvc.ko if they weren't linked into the kernel. For example, here's what I saw: nfsv4-laptop# kldstat Id Refs AddressSize Name 1 12 0xc040 d1f338 kernel 41 0xc2eff000 1e000nfsclient.ko 51 0xc2ea9000 2000 nfs_common.ko 62 0xc2f1d000 15000krpc.ko 111 0xc2fe3000 16000nfscommon.ko 121 0xc2fc5000 2000 nfssvc.ko nfsv4-laptop# nm /boot/nkernel/krpc.ko | fgrep svcpool cdf0 t svcpool_active de40 t svcpool_create e590 t svcpool_destroy e1d0 t svcpool_maxthread_sysctl e2b0 t svcpool_minthread_sysctl and nfsd wouldn't load because it couldn't find svcpool_destroy, just like you saw. If you apply this patch and rebuild the module, it will find the symbols. (Is that what is supposed to happen or is something broken?) --- fs/nfsserver/nfs_nfsdport.c.sav 2010-06-12 20:27:53.0 -0400 +++ fs/nfsserver/nfs_nfsdport.c 2010-06-12 20:37:09.0 -0400 @@ -3147,4 +3147,6 @@ MODULE_VERSION(nfsd, 1); MODULE_DEPEND(nfsd, nfscommon, 1, 1, 1); MODULE_DEPEND(nfsd, nfslockd, 1, 1, 1); +MODULE_DEPEND(nfsd, krpc, 1, 1, 1); +MODULE_DEPEND(nfsd, nfssvc, 1, 1, 1); ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD eats 169.254.x.x addressed packets
Guy Helmer wrote: My previous understanding was that RFC 3927 did not allow transmitting datagrams involving the 169.254.0.0/16 link-local prefix; now that I've looked over the RFC more closely, I'm not sure that is the case. I have cc'ed Bruce Simpson on this message in hopes that he can shed some light on this. I believe he committed the change that disallowed transmitting from 169.254.0.0/16 addresses. RFC 3927 is pretty clear that 169.254.0.0/16 traffic is not to be forwarded beyond the link. I do not understand why the OP is losing traffic, unless he's relying on pre-RFC 3927 behaviour in his network topology. The IN_LINKLOCAL() check happens after ip_input() walks the address hash looking for exact address matches. So if an interface has a link-local address, the packet should get delivered upstack as usual. When I made this change, link-local addressing couldn't be fully implemented in FreeBSD, due to the lack of support for address scopes in the FreeBSD IPv4 code. Hopefully new people can pick up on it as they wish. thanks BMS ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Re: Re: freeBSD nullfs together with nfs and silly rename
On Sat, Jun 12, 2010 at 07:06:11PM -0400, Rick Macklem wrote: On Sat, 12 Jun 2010, Kostik Belousov wrote: Yes, I hoped that Mikolaj ends up with something similar :). Please note that this is racy, since we cannot know why usecount is greater then 1. This might cause the silly rename to kick in some time where it should not, but the race is rare. I'd say that having silly rename happen once in a while for unlink when it doesn't have to happen is better than having the file deleted on the server while it is still open on the client. My note was not an objection, only a note. Also, when committing, please add a comment explaining what is going on. pgplm57gi6oDf.pgp Description: PGP signature
Re: nfsv4_server_enable=YES: link_elf: symbol svcpool_destroy undefined
On Sat, Jun 12, 2010 at 10:08:31PM -0400, Rick Macklem wrote: On Sun, 13 Jun 2010, Dmitry Pryanishnikov wrote: Hello! I'm trying to start the experimental NFSv4 server in RELENG_8 w/o building it into the kernel, as nfsv4(4) suggests: ... or start mountd(8) and nfsd(8) with the ``-e'' option to force use of the experimental server. The nfsuserd(8) daemon must also be running. This will occur if nfs_server_enable=YES nfsv4_server_enable=YES nfsuserd_enable=YES are set in rc.conf(5). However, mountd fails to start nfsd; the same problem exists when doing it by hands: kldload nfsd gives kernel: link_elf: symbol svcpool_destroy undefined error. Can this problem be solved w/o building kernel with options NFSD? Well, if you build a kernel with any of the options that cause krpc to be compiled into the kernel, it works. (I usually test with a GENERIC kernel that has NFSCLIENT and NFSSERVER defined in it, so nfsd.ko loads fine.) Basically nfsd is defined as dependent on nfscommon, then nfscommon is defined as dependent on krpc and nfssvc. This gets everthing to load, but when it tries to load nfsd.ko, it can't find the symbols in krpc.ko or nfssvc.ko if they weren't linked into the kernel. For example, here's what I saw: nfsv4-laptop# kldstat Id Refs AddressSize Name 1 12 0xc040 d1f338 kernel 41 0xc2eff000 1e000nfsclient.ko 51 0xc2ea9000 2000 nfs_common.ko 62 0xc2f1d000 15000krpc.ko 111 0xc2fe3000 16000nfscommon.ko 121 0xc2fc5000 2000 nfssvc.ko nfsv4-laptop# nm /boot/nkernel/krpc.ko | fgrep svcpool cdf0 t svcpool_active de40 t svcpool_create e590 t svcpool_destroy e1d0 t svcpool_maxthread_sysctl e2b0 t svcpool_minthread_sysctl and nfsd wouldn't load because it couldn't find svcpool_destroy, just like you saw. If you apply this patch and rebuild the module, it will find the symbols. (Is that what is supposed to happen or is something broken?) --- fs/nfsserver/nfs_nfsdport.c.sav 2010-06-12 20:27:53.0 -0400 +++ fs/nfsserver/nfs_nfsdport.c 2010-06-12 20:37:09.0 -0400 @@ -3147,4 +3147,6 @@ MODULE_VERSION(nfsd, 1); MODULE_DEPEND(nfsd, nfscommon, 1, 1, 1); MODULE_DEPEND(nfsd, nfslockd, 1, 1, 1); +MODULE_DEPEND(nfsd, krpc, 1, 1, 1); +MODULE_DEPEND(nfsd, nfssvc, 1, 1, 1); I think the patch is right, this is how symbol resolution is supposed to work. pgpaz7LYBqXzx.pgp Description: PGP signature