Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On Tue, 2013-01-08 at 09:20 -0800, Adrian Chadd wrote: On 8 January 2013 08:15, Richard Sharpe rsha...@richardsharpe.com wrote: On Tue, 2013-01-08 at 07:36 -0800, Adrian Chadd wrote: .. or you could abstract it out a bit and use freebsd's aio_waitcomplete() or kqueue aio notification. It'll then behave much saner. Yes, going forward that is what I want to do ... this would work nicely with a kqueue back-end for Samba's tevent subsystem, and if someone has not already written such a back end, I will have to do so, I guess. Embrace FreeBSD's nice asynchronous APIs for doing things! You know you want to! (Then, convert parts of samba over to use grand central dispatch... :-) Seriously though - I was doing network/disk IO using real time signals what, 10 + years ago on Linux and it plain sucked. AIO + kqueue + waitcomplete is just brilliant. kqueue for signal delivery is also just brilliant. Just saying. The problem with a fully event-driven approach is that it will not work, it seems to me. Eventually, you find something that is not async and then you have to go threaded. (Because handling multiple clients in one process is very useful and you do not want client-A's long-running op preventing client-B's short-running op from being serviced.) Then, you run into problems like Posix's insistence that all threads in a process must use the same credentials (ie, uid and gids must be the same across all threads), although there is a hack on Linux to work around this behind glibc's back. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On Wed, 2013-01-09 at 10:06 +0800, David Xu wrote: [...] This code won't work, as I said, after the signal handler returned, kernel will copy the signal mask contained in ucontext into kernel space, and use it in feature signal delivering. The code should be modified as following: void handler(int signum, siginfo_t *info, ucontext_t *uap) { ... if (count + 1 == TEVENT_SA_INFO_QUEUE_COUNT) { sigaddset(uap-uc_sigmask, signum); Hmmm, this seems unlikely because the signal handler is operating in user mode and has no access to kernel-mode variables. Well, it turns out that your suggestion was correct. I did some more searching and found another similar suggestion, so I gave it a whirl, and it works. Now, my problem is that Jeremy Allison thinks that it is a fugly hack. This means that I will probably have big problems getting a patch for this into Samba. I guess a couple of questions I have now are: 1. Is this the same for all versions of FreeBSD since Posix RT Signals were introduced? I have checked source code, and found from FreeBSD 7.0, RT signal is supported, and aio code uses signal queue. 2. Which (interpretation of which) combination of standards require such an approach? The way I introduced is standard: http://pubs.opengroup.org/onlinepubs/007904975/functions/sigaction.html I quoted some text here: When a signal is caught by a signal-catching function installed by sigaction(), a new signal mask is calculated and installed for the duration of the signal-catching function (or until a call to either sigprocmask() or sigsuspend() is made). This mask is formed by taking the union of the current signal mask and the value of the sa_mask for the signal being delivered [XSI] [Option Start] unless SA_NODEFER or SA_RESETHAND is set, [Option End] and then including the signal being delivered. If and when the user's signal handler returns normally, the original signal mask is restored. ... When the signal handler returns, the receiving thread resumes execution at the point it was interrupted unless the signal handler makes other arrangements. If longjmp() or _longjmp() is used to leave the signal handler, then the signal mask must be explicitly restored. This volume of IEEE Std 1003.1-2001 defines the third argument of a signal handling function when SA_SIGINFO is set as a void * instead of a ucontext_t *, but without requiring type checking. New applications should explicitly cast the third argument of the signal handling function to ucontext_t *. ^ --- The above means third parameter is pointing to ucontext_t which is used to restored the previously interrupted context, the context contains a signal mask which is also restored. http://pubs.opengroup.org/onlinepubs/007904975/basedefs/ucontext.h.html OK, thank you for that. Jeremy agrees that this is a portable approach, at least across Linux, FreeBSD and Solaris. We will try to get a fix into Samba to do it the correct way. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On Tue, 2013-01-08 at 15:02 +0800, David Xu wrote: On 2013/01/08 14:33, Richard Sharpe wrote: On Tue, 2013-01-08 at 10:46 +0800, David Xu wrote: On 2013/01/08 09:27, Richard Sharpe wrote: Hi folks, I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0 and I want to check if the assumptions made by the original coder are correct. Essentially, the code queues a number of AIO requests (up to 100) and specifies an RT signal to be sent upon completion with siginfo_t. These are placed into an array. The code assumes that when handling one of these signals, if it has already received N such siginfo_t structures, it can BLOCK further instances of the signal while these structures are drained by the main code in Samba. However, my debugging suggests that if a bunch of signals have already been queued, you cannot block those undelivered but already queued signals. I am certain that they are all being delivered to the main thread and that they keep coming despite the code trying to stop them at 64 (they get all the way up to the 100 that were queued.) Can someone confirm whether I have this correct or not? I am curious that how the code BLOCKs the signal in its signal handler ? AFAIK, after signal handler returned, original signal mask is restored, and re-enables the signal delivering, unless you change it in ucontext.uc_sigmask. It does try to block the signals in the signal handler using the following code (in the signal handler): if (count+1 == TEVENT_SA_INFO_QUEUE_COUNT) { /* we've filled the info array - block this signal until these ones are delivered */ sigset_t set; sigemptyset(set); sigaddset(set, signum); sigprocmask(SIG_BLOCK, set, NULL); However, I also added pthread_sigmask with the same parameters to see if that made any difference and it seemed not to. This code won't work, as I said, after the signal handler returned, kernel will copy the signal mask contained in ucontext into kernel space, and use it in feature signal delivering. The code should be modified as following: void handler(int signum, siginfo_t *info, ucontext_t *uap) { ... if (count + 1 == TEVENT_SA_INFO_QUEUE_COUNT) { sigaddset(uap-uc_sigmask, signum); Hmmm, this seems unlikely because the signal handler is operating in user mode and has no access to kernel-mode variables. I guess I will just have to read the code. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On Tue, 2013-01-08 at 07:36 -0800, Adrian Chadd wrote: .. or you could abstract it out a bit and use freebsd's aio_waitcomplete() or kqueue aio notification. It'll then behave much saner. Yes, going forward that is what I want to do ... this would work nicely with a kqueue back-end for Samba's tevent subsystem, and if someone has not already written such a back end, I will have to do so, I guess. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On Tue, 2013-01-08 at 08:14 -0800, Richard Sharpe wrote: On Tue, 2013-01-08 at 15:02 +0800, David Xu wrote: On 2013/01/08 14:33, Richard Sharpe wrote: On Tue, 2013-01-08 at 10:46 +0800, David Xu wrote: On 2013/01/08 09:27, Richard Sharpe wrote: Hi folks, I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0 and I want to check if the assumptions made by the original coder are correct. Essentially, the code queues a number of AIO requests (up to 100) and specifies an RT signal to be sent upon completion with siginfo_t. These are placed into an array. The code assumes that when handling one of these signals, if it has already received N such siginfo_t structures, it can BLOCK further instances of the signal while these structures are drained by the main code in Samba. However, my debugging suggests that if a bunch of signals have already been queued, you cannot block those undelivered but already queued signals. I am certain that they are all being delivered to the main thread and that they keep coming despite the code trying to stop them at 64 (they get all the way up to the 100 that were queued.) Can someone confirm whether I have this correct or not? I am curious that how the code BLOCKs the signal in its signal handler ? AFAIK, after signal handler returned, original signal mask is restored, and re-enables the signal delivering, unless you change it in ucontext.uc_sigmask. It does try to block the signals in the signal handler using the following code (in the signal handler): if (count+1 == TEVENT_SA_INFO_QUEUE_COUNT) { /* we've filled the info array - block this signal until these ones are delivered */ sigset_t set; sigemptyset(set); sigaddset(set, signum); sigprocmask(SIG_BLOCK, set, NULL); However, I also added pthread_sigmask with the same parameters to see if that made any difference and it seemed not to. This code won't work, as I said, after the signal handler returned, kernel will copy the signal mask contained in ucontext into kernel space, and use it in feature signal delivering. The code should be modified as following: void handler(int signum, siginfo_t *info, ucontext_t *uap) { ... if (count + 1 == TEVENT_SA_INFO_QUEUE_COUNT) { sigaddset(uap-uc_sigmask, signum); Hmmm, this seems unlikely because the signal handler is operating in user mode and has no access to kernel-mode variables. Well, it turns out that your suggestion was correct. I did some more searching and found another similar suggestion, so I gave it a whirl, and it works. Now, my problem is that Jeremy Allison thinks that it is a fugly hack. This means that I will probably have big problems getting a patch for this into Samba. I guess a couple of questions I have now are: 1. Is this the same for all versions of FreeBSD since Posix RT Signals were introduced? 2. Which (interpretation of which) combination of standards require such an approach? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On Tue, 2013-01-08 at 22:24 -0500, Daniel Eischen wrote: On Tue, 8 Jan 2013, Daniel Eischen wrote: On Tue, 8 Jan 2013, Richard Sharpe wrote: [ ... ] Well, it turns out that your suggestion was correct. I did some more searching and found another similar suggestion, so I gave it a whirl, and it works. Now, my problem is that Jeremy Allison thinks that it is a fugly hack. This means that I will probably have big problems getting a patch for this into Samba. I don't understand why JA thinks this is a hack. Their current method doesn't work, or at least isn't portable. I've tried this on Solaris 10, and it works just as it does in FreeBSD. Test program included after signature. $ ./test_sigprocmask Sending signal 16 Got signal 16, blocked: true Blocking signal 16 using method 0 Handled signal 16, blocked: false Sending signal 16 Got signal 16, blocked: true Blocking signal 16 using method 1 Handled signal 16, blocked: true Weird - I just tested it on Linux (2.6.18-238.el5) and it works the same as FreeBSD and Solaris. Am I misunderstanding something? Is it possible that Samba's code is broken on all platforms? It is possible :-) AIO is off by default in configure. Then, when you switch it on in configure you have to switch it on in the smb.conf. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Is it possible to block pending queued RealTime signals (AIO originating)?
Hi folks, I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0 and I want to check if the assumptions made by the original coder are correct. Essentially, the code queues a number of AIO requests (up to 100) and specifies an RT signal to be sent upon completion with siginfo_t. These are placed into an array. The code assumes that when handling one of these signals, if it has already received N such siginfo_t structures, it can BLOCK further instances of the signal while these structures are drained by the main code in Samba. However, my debugging suggests that if a bunch of signals have already been queued, you cannot block those undelivered but already queued signals. I am certain that they are all being delivered to the main thread and that they keep coming despite the code trying to stop them at 64 (they get all the way up to the 100 that were queued.) Can someone confirm whether I have this correct or not? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On Mon, 2013-01-07 at 22:24 -0500, Daniel Eischen wrote: On Mon, 7 Jan 2013, Richard Sharpe wrote: Hi folks, I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0 and I want to check if the assumptions made by the original coder are correct. Essentially, the code queues a number of AIO requests (up to 100) and specifies an RT signal to be sent upon completion with siginfo_t. These are placed into an array. The code assumes that when handling one of these signals, if it has already received N such siginfo_t structures, it can BLOCK further instances of the signal while these structures are drained by the main code in Samba. However, my debugging suggests that if a bunch of signals have already been queued, you cannot block those undelivered but already queued signals. I am certain that they are all being delivered to the main thread and that they keep coming despite the code trying to stop them at 64 (they get all the way up to the 100 that were queued.) Can someone confirm whether I have this correct or not? If true, could they not use sigwaitinfo() from a separate thread instead and just bypass having to use a signal handler altogether? That thread can either call sigwaitinfo() when it is ready to receive more signals, or block on a semaphore/CV/whatever while events are being processed. So, I guess that what I want is something that will continue to work for both Linux and FreeBSD with minimal code divergence ... I guess I need to write a simpler program to check what the deal is. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On Tue, 2013-01-08 at 10:46 +0800, David Xu wrote: On 2013/01/08 09:27, Richard Sharpe wrote: Hi folks, I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0 and I want to check if the assumptions made by the original coder are correct. Essentially, the code queues a number of AIO requests (up to 100) and specifies an RT signal to be sent upon completion with siginfo_t. These are placed into an array. The code assumes that when handling one of these signals, if it has already received N such siginfo_t structures, it can BLOCK further instances of the signal while these structures are drained by the main code in Samba. However, my debugging suggests that if a bunch of signals have already been queued, you cannot block those undelivered but already queued signals. I am certain that they are all being delivered to the main thread and that they keep coming despite the code trying to stop them at 64 (they get all the way up to the 100 that were queued.) Can someone confirm whether I have this correct or not? I am curious that how the code BLOCKs the signal in its signal handler ? AFAIK, after signal handler returned, original signal mask is restored, and re-enables the signal delivering, unless you change it in ucontext.uc_sigmask. It does try to block the signals in the signal handler using the following code (in the signal handler): if (count+1 == TEVENT_SA_INFO_QUEUE_COUNT) { /* we've filled the info array - block this signal until these ones are delivered */ sigset_t set; sigemptyset(set); sigaddset(set, signum); sigprocmask(SIG_BLOCK, set, NULL); However, I also added pthread_sigmask with the same parameters to see if that made any difference and it seemed not to. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Possible obscure socket leak when system under load and listener is slow to accept
On Sun, 2012-12-09 at 00:10 -0800, Alfred Perlstein wrote: On 12/8/12 5:05 PM, Richard Sharpe wrote: On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote: Hi folks, Our QA group (at xxx) using Samba and smbtorture has been seeing a lot of cases where accept returns ECONNABORTED because the system load is high and Samba has a large listen backlog. Every now and then we get a crash in smbd or in winbindd and winbindd complains of too many open files in the system. In looking at kern_accept, it seems to me that FreeBSD can leak a socket when kern_accept calls soaccept on it but gets ECONNABORTED. This error is the only error returned from tcp_usr_accept. It seems like the socket taken off so_comp is never freed in this case and that there has been a call on soref on it as well, so that something like the following is needed in the error path: //some-path/freebsd/sys/kern/uipc_syscalls.c#1 - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c @@ -433,6 +433,14 @@ */ if (name) *namelen = 0; + /* +* We need to close the socket we unlinked +* so we do not leak it. +*/ + ACCEPT_LOCK(); + SOCK_LOCK(so); + soclose(so); goto noconnection; } if (sa == NULL) { I think an soclose is needed at this point because soisconnected has been called on the socket. Do you think this analysis is reasonable? We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, maybe I am wrong since I am not sure if the fdclose call would free the socket, but a quick look suggested that it doesn't. The fdclose should properly tear down the file descriptor. The call graph is: fdclose() - fdrop() - _fdrop() - fo_close()/soo_close() - soclose() - sorele() - sofree() - sodealloc(). A socket leak would not count against kern.maxfiles unless the file descriptor leaks as well. So it is unlikely that this is the problem. OK, thanks for the feedback. I will keep looking. Samba may open a large number of files (real files and sockets) and you may run into the maxfiles limit. You can check the limit with sysctl kern.maxfiles and increase it at boot time in boot/loader.conf with kern.maxfiles=10 for example. Well, some of the smbds are dying, but it is possible that there is a file leak in Samba or our VFS that we are tripping as well. lsof and sockstat can be helpful. lsof may be able to help determine if there's a leak because it MAY will find sockets not associated with a process. Hope this helps. Thanks Alfred. After following through the call graph and confirming (with the code) that it was correct, I am now pretty convinced that I was wrong in assuming that it was a socket leak. However, lsof will be useful in allowing me to see how many FDs each smdb in this test is using. We have, I am told, kern.maxfiles set to 65536, which I think might be a little low for the test they are running. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Possible obscure socket leak when system under load and listener is slow to accept
Hi folks, Our QA group (at xxx) using Samba and smbtorture has been seeing a lot of cases where accept returns ECONNABORTED because the system load is high and Samba has a large listen backlog. Every now and then we get a crash in smbd or in winbindd and winbindd complains of too many open files in the system. In looking at kern_accept, it seems to me that FreeBSD can leak a socket when kern_accept calls soaccept on it but gets ECONNABORTED. This error is the only error returned from tcp_usr_accept. It seems like the socket taken off so_comp is never freed in this case and that there has been a call on soref on it as well, so that something like the following is needed in the error path: //some-path/freebsd/sys/kern/uipc_syscalls.c#1 - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c @@ -433,6 +433,14 @@ */ if (name) *namelen = 0; + /* +* We need to close the socket we unlinked +* so we do not leak it. +*/ + ACCEPT_LOCK(); + SOCK_LOCK(so); + soclose(so); goto noconnection; } if (sa == NULL) { I think an soclose is needed at this point because soisconnected has been called on the socket. Do you think this analysis is reasonable? We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, maybe I am wrong since I am not sure if the fdclose call would free the socket, but a quick look suggested that it doesn't. I would appreciate your feedback. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Possible obscure socket leak when system under load and listener is slow to accept
On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote: Hi folks, Our QA group (at xxx) using Samba and smbtorture has been seeing a lot of cases where accept returns ECONNABORTED because the system load is high and Samba has a large listen backlog. Every now and then we get a crash in smbd or in winbindd and winbindd complains of too many open files in the system. In looking at kern_accept, it seems to me that FreeBSD can leak a socket when kern_accept calls soaccept on it but gets ECONNABORTED. This error is the only error returned from tcp_usr_accept. It seems like the socket taken off so_comp is never freed in this case and that there has been a call on soref on it as well, so that something like the following is needed in the error path: //some-path/freebsd/sys/kern/uipc_syscalls.c#1 - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c @@ -433,6 +433,14 @@ */ if (name) *namelen = 0; + /* +* We need to close the socket we unlinked +* so we do not leak it. +*/ + ACCEPT_LOCK(); + SOCK_LOCK(so); + soclose(so); goto noconnection; } if (sa == NULL) { I think an soclose is needed at this point because soisconnected has been called on the socket. Do you think this analysis is reasonable? We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, maybe I am wrong since I am not sure if the fdclose call would free the socket, but a quick look suggested that it doesn't. The fdclose should properly tear down the file descriptor. The call graph is: fdclose() - fdrop() - _fdrop() - fo_close()/soo_close() - soclose() - sorele() - sofree() - sodealloc(). A socket leak would not count against kern.maxfiles unless the file descriptor leaks as well. So it is unlikely that this is the problem. OK, thanks for the feedback. I will keep looking. Samba may open a large number of files (real files and sockets) and you may run into the maxfiles limit. You can check the limit with sysctl kern.maxfiles and increase it at boot time in boot/loader.conf with kern.maxfiles=10 for example. Well, some of the smbds are dying, but it is possible that there is a file leak in Samba or our VFS that we are tripping as well. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Possible problems with mmap/munmap on FreeBSD ...
On Tue, 29 Mar 2005, David Schultz wrote: On Tue, Mar 29, 2005, Richard Sharpe wrote: Hi, I am having some problems with the tdb package on FreeBSD 4.6.2 and 4.10. One of the things the above package does is: mmap the tdb file to a region of memory store stuff in the region (memmov etc). when it needs to extend the size of the region { munmap the region write data at the end of the file mmap the region again with a larger size } What I am seeing is that after the munmap the data written to the region is gone. However, if I insert an msync before the munmap, everything is nicely coherent. This seems odd (in the sense that it works without the msync under Linux). The region is mmapped with: mmap(NULL, tdb-map_size, PROT_READ|(tdb-read_only? 0:PROT_WRITE), MAP_SHARED|MAP_FILE, tdb-fd, 0); It looks like all of the underlying pages are getting invalidated in vm_object_page_remove(). This is clearly the right thing to do for private mappings, but it seems wrong for shared mappings. Perhaps Alan has some insight. OK, a simple test program that: writes some content C1 to a file mmaps file to S1 writes content C2 to S1 munmaps S1 mmaps S1 compares shows expected behavior writes content C1 to S1 munmaps S1 mmaps S1 compares shows expected behavior So, now to do things like extend the file after mmapping etc to see where the problem lies. Regards - Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Possible problems with mmap/munmap on FreeBSD ...
Hi, I am having some problems with the tdb package on FreeBSD 4.6.2 and 4.10. One of the things the above package does is: mmap the tdb file to a region of memory store stuff in the region (memmov etc). when it needs to extend the size of the region { munmap the region write data at the end of the file mmap the region again with a larger size } What I am seeing is that after the munmap the data written to the region is gone. However, if I insert an msync before the munmap, everything is nicely coherent. This seems odd (in the sense that it works without the msync under Linux). The region is mmapped with: mmap(NULL, tdb-map_size, PROT_READ|(tdb-read_only? 0:PROT_WRITE), MAP_SHARED|MAP_FILE, tdb-fd, 0); What I notice is that all the calls to mmap return the same address. A careful reading of the man pages for mmap and munmap does not suggest that I am doing anything wrong. Is it possible that FreeBSD is deferring flushing the dirty data, and then forgets to do it when the same starting address is used etc? Regards - Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Error in my C programming
On Mon, 21 Feb 2005, Kathy Quinlan wrote: Peter Jeremy wrote: On Mon, 2005-Feb-21 00:22:56 +0800, Kathy Quinlan wrote: These are some of the errors I get in pairs for each of the above variables: Wtrend_Drivers.c:15: conflicting types for `Receiver' Wtrend_Drivers.h:9: previous declaration of `Receiver' Without knowing exactly what is on those lines, it's difficult to offer any concrete suggestions. Two possible ways forward: 1) Change the declaration at Wtrend_Drivers.h:9 to be 'extern' 2) Pre-process the source and have a close look at the definitions and declarations for Receiver. You may have a stray #define that is confusing the type or a missing semicolon. Peter Here is a section of my code: *** Wtrend_Drivers.c *** (12)void Reset_Network (unsigned char Network) (13) { (14)Length = 0x00; (15)Receiver = 0x00; (16)Node = 0xFF; (17)Command = Reset; (18)Make_Packet_Send(Head , Length, Network, Receiver, Node, Command, p_Data); (19) } *** Wtrend_Drivers.h *** unsigned char Length , Network , Receiver , Node , Command = 0x00; The above is line 9 of the Wtrend_Drivers.h The numbers in () I have added to show the line numbers in Wtrend_Drivers.c These are some of the errors I get in pairs for each of the above variables: Wtrend_Drivers.c:15: conflicting types for `Receiver' Wtrend_Drivers.h:9: previous declaration of `Receiver' Ummm, move the definition of all those variables to before their first use and see what that does. Also, check that you do not have an earlier definition that does not include the extern keyword. Regards - Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Error in my C programming
On Sun, 20 Feb 2005, Michael C. Shultz wrote: Here is a section of my code: *** Wtrend_Drivers.c *** (12)void Reset_Network (unsigned char Network) (13) { (14)Length = 0x00; (15)Receiver = 0x00; (16)Node = 0xFF; (17)Command = Reset; (18)Make_Packet_Send(Head , Length, Network, Receiver, Node, Command, p_Data); (19) } *** Wtrend_Drivers.h *** unsigned char Length , Network , Receiver , Node , Command = 0x00; The above is line 9 of the Wtrend_Drivers.h The numbers in () I have added to show the line numbers in Wtrend_Drivers.c These are some of the errors I get in pairs for each of the above variables: Wtrend_Drivers.c:15: conflicting types for `Receiver' Wtrend_Drivers.h:9: previous declaration of `Receiver' I would try putting the variables in the header file on separate lines. For example: unsigned char Length = 0; unsigned char Network = 0; unsigned char Receiver = 0; etc. Done that to no avail :( Regards, Kat. I wonder if Receiver is defined in a include file elsewhere? I checked all the header files on my system and it isn't, perhaps it is on your though? Maybe easier to rename it? However, the error messages point out that the conflicting definition is where Receiver is first used in the function in the .c file. If it was another definition, we would be told of the actual .h file where the definition came from. I have seen that lots of times :-) Regards - Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Error in my C programming
On Sun, 20 Feb 2005, Michael C. Shultz wrote: *** Wtrend_Drivers.c *** (12)void Reset_Network (unsigned char Network) (13) { (14)Length = 0x00; (15)Receiver = 0x00; (16)Node = 0xFF; (17)Command = Reset; (18)Make_Packet_Send(Head , Length, Network, Receiver, Node, Command, p_Data); (19) } *** Wtrend_Drivers.h *** unsigned char Length , Network , Receiver , Node , Command = 0x00; The above is line 9 of the Wtrend_Drivers.h The numbers in () I have added to show the line numbers in Wtrend_Drivers.c These are some of the errors I get in pairs for each of the above variables: Wtrend_Drivers.c:15: conflicting types for `Receiver' Wtrend_Drivers.h:9: previous declaration of `Receiver' [Deletia ..] I wonder if Receiver is defined in a include file elsewhere? I checked all the header files on my system and it isn't, perhaps it is on your though? Maybe easier to rename it? However, the error messages point out that the conflicting definition is where Receiver is first used in the function in the .c file. If it was another definition, we would be told of the actual .h file where the definition came from. I have seen that lots of times :-) Regards Your right. We do not have enough of her code. I tried this: #include stdio.h unsigned char Receiver = 0; int main(void) { Receiver = 0x00; printf( Receiver -=%c\n, Receiver ); return(0); } compiled it with: gcc -W -Wall -ansi -pedantic -Wbad-function-cast -Wcast-align \ -Wcast-qual -Wchar-subscripts -Winline \ -Wmissing-prototypes -Wnested-externs -Wpointer-arith \ -Wredundant-decls -Wshadow -Wstrict-prototypes zz.c -o zz and no warnings In private correspondence with the person asking the question it was indicated that initially GCC was used with no flags. Regards - Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Throughput problems with NFS between Linux and FreeBSD
Hi, We recently encountered a problem with NFS throughput between a FreeBSD server (we are using 4.6.2, but the same code seems to be in 5.1 as well). When using Linux 2.4.19 or 2.4.21 as a client, although this might extend to other clients, and copying a large file, you will see the behavior shown in http://www.richardsharpe.com/ethereal-stuff.html#Time%20Sequence%20Graphs This happens because Linux hangs onto the ack for the last segment of a 32kB+header send for a while. The FreeBSD NFS server will not put anymore data in the socket because of an soreserve with a size of 32kB+header, so it waits for about 39mS until Linux finally sends the ack for the last segment. (Unless there is data, like another command, going the other way, that is). Throughput is about 3MB/s on GigE. The problem seems to be the following code if (so-so_type == SOCK_STREAM) siz = NFS_MAXPACKET + sizeof (u_long); else siz = NFS_MAXPACKET; error = soreserve(so, siz, siz); in src/sys/nfs/nfs_syscalls.c. We added a sysctl to allow finer control over what is passed to soreserve. With the fix in, it goes up to around wire speed when lots of data is in the cache. This was found by Chandu Gadhiraju with help from others. Regards - Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Throughput problems with NFS between Linux and FreeBSD
On Fri, 19 Sep 2003, John-Mark Gurney wrote: Richard Sharpe wrote this message on Fri, Sep 19, 2003 at 10:38 -0700: We recently encountered a problem with NFS throughput between a FreeBSD server (we are using 4.6.2, but the same code seems to be in 5.1 as well). When using Linux 2.4.19 or 2.4.21 as a client, although this might extend to other clients, and copying a large file, you will see the behavior shown in [...] The problem seems to be the following code if (so-so_type == SOCK_STREAM) siz = NFS_MAXPACKET + sizeof (u_long); else siz = NFS_MAXPACKET; error = soreserve(so, siz, siz); in src/sys/nfs/nfs_syscalls.c. We added a sysctl to allow finer control over what is passed to soreserve. With the fix in, it goes up to around wire speed when lots of data is in the cache. What is the fix? You don't say what adjustments to soreserve's parameters are necessary to improve performance? Have you done testing against other clients to see how your changes will affect performance on those machines? The beest fix is: if (so-so_type == SOCK_STREAM) -siz = NFS_MAXPACKET + sizeof (u_long); +siz = NFS_MAXPACKET + sizeof (u_long) + MSS; else siz = NFS_MAXPACKET; error = soreserve(so, siz, siz); in src/sys/nfs/nfs_syscalls.c. Since the client should only hang onto the ack for one segment, and that will work even if you have end-to-end jumbo frames. A simpler fix might be to replace MSS with 2048. Regards - Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD on Intel Server Board SE7501WV2
On Thu, 11 Sep 2003, Yaoping Ruan wrote: Hello, We plan to install FreeBSD on an Intel Server Board SE7501WV2 with 1 XEON 2.4GHz CPU, 4GB PC2100 DDR memory, and a Seagate 120GB ATA 7200RPM harddisk, and use the server box for high demand SpecWeb99 tests. Does anyone have any experience on this Server Board, and see any compatibility problem here? The reason I am asking is that we had some kernel compatibility issues on other Intel Board. Is that the dual-processor-capable borad? If so, I have one of those at home, and it works with FreeBSD 4.7 or so. It certainly has the 7501 chipset in it and 2x1.8GHz Xeons. An other question about this box is the two on-board Gigabit Network Controller. Do they work fine on FreeBSD? May I use fiber Giganet NCI like Netgear GA621 on it? They worked for me. I dunno about the fibre stuff. Regards - Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Looking for FreeBSD kernel debugging help
On Wed, 11 Jun 2003, Nat Lanza wrote: On Wed, 2003-06-11 at 06:22, Terry Lambert wrote: Someone should port the network debugging from Darwin using the tiny IP stack from NetBSD. Well, there's this: http://ipgdb.sourceforge.net/ IPGDB is a collection of extensions to GDB and FreeBSD-4.3 to allow two-machine kernel debugging over UDP. It behaves much like two-machine kernel debugging over serial ports. These extensions can easily be applied to other releases of FreeBSD. With a little bit of modification, these extension can be applied to other BSD variants. It hasn't been updated in a while, but it's definitely a start. It works pretty well for 4.3, and I know it's been updated to work with 4.6 (though possibly not in the sourceforge distribution). I think that Groggy was working on this a while back. Regards - Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
What prevents there from being a race with open(...,O_CREAT |O_EXCL,...)
Hi, In looking at vn_open, I see that it calls namei and then a little while later calls VOP_CREATE. If the user did open(..., ... O_CREAT | O_EXCL, ...), what prevents a race where one process discovers that the name doesn't already exist but another gets in and creates the name? Regards - Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: pw(8): $ (dollar sign) in username
On Fri, 27 Dec 2002, Ryan Thompson wrote: Hi all, I've recently had the pleasure of configuring a FreeBSD machine as a Samba Primary Domain Controller. In smb.conf, one can specify an add user script directive to automate the creation of machine accounts. Otherwise, you have to manually create accounts for each machine on the network. See: http://us1.samba.org/samba/ftp/docs/htmldocs/Samba-PDC-HOWTO.html Problem is, smb requires a '$' at the end of the username, which our pw(8) doesn't allow. Well, the $ is only required for machine accounts! In Samba 3.0, and possibly in the latest 2.2.x releases, there is a separate 'add machine script' parameters. I think it would be better to simply frob the entry in the master.passwd in that script. While I have not tried it myself, I am lead to believe that once you edit the entry and run the appropriate command, things work. Allowing the $ is a one-character change to usr.sbin/pw/pw_user.c . Aside from the obvious pain of accidentally inserting shell variables as part of a username if the $ is not escaped, are there any specific problems with this change? Others would probably benefit from this. Is the change worth committing? Or would it be better to push this to pw.conf? --- usr.sbin/pw/pw_user.c.orig Sat Nov 16 21:55:28 2002 +++ usr.sbin/pw/pw_user.c Fri Dec 27 11:17:33 2002 @@ -1195,7 +1195,7 @@ pw_checkname(u_char *name, int gecos) { int l = 0; - char const *notch = gecos ? :!@ : ,\t:+#%$^()!@~*?=|\\/\; + char const *notch = gecos ? :!@ : ,\t:+#%^()!@~*?=|\\/\; while (name[l]) { if (strchr(notch, name[l]) != NULL || name[l] ' ' || name[l] == 127 || - Ryan -- Regards - Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: nsswitch help for you?
On Wed, 25 Dec 2002 [EMAIL PROTECTED] wrote: Quoting Danny Braniss [EMAIL PROTECTED]: what exactly do you want/need? danny Sorry, I'll try to get as specific as I can with my currently limited knowledge of the FBSD source code. Basically, I would like to know where I can find information on the nsswitch protocol (if that is even such a thing): perhaps a document or standards paper? Well, here is what I think is needed: The ability to load shared libraries like /lib/libnss_ldap.so or /lib/libnss_winbind.so from within the *pw* and the *gr* calls, under control of /etc/nsswitch.conf. My suggestion would be to look at how Linux or Slowaris do this. My understanding is that the current nsswitch on FreeBSD is hard-coded as to the places it can look in, so winbindd cannot be used as a backend, nore can ldap, etc. I spent some time looking for the interface spec on the web, but could not find one, although you can infer what it looks like by looking at the winbind_nss.c code in Samba. I imagine that libc is going to need to be modified a bit to handle this, but I am only guessing. Regards - Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, sharpe[at]ethereal.com, http://www.richardsharpe.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: [hackers] Re: Netgraph could be a router also.
On Mon, 11 Nov 2002, David Gilbert wrote: Terry == Terry Lambert [EMAIL PROTECTED] writes: Terry By it, I guess you mean FreeBSD? Terry What are your performance goals? Right now, I'd like to see 500 to 600 kpps. Terry Where is FreeBSD relative to those goals, right now, without Terry you doing anything to it? Without any work, we got 75 kpps. Terry Where is FreeBSD relative to those goals, right now, if you Terry tune it very carefully, but don't hack any code? With a few patches, including polling and some tuning, we got 150 to 200 kpps. Note that we've been focusing on pps, not Mbs. With 100M cards (what we're currently using) we want to focus on getting the routing speed up. One of the largest problems we've found with GigE adapters on FreeBSD is that their pps ability (never mind the volume of data) is less than half that of the fxp driver. This is intriguing. I have found with Samba that I am able to achieve approx 100MB/s read from cache with 1500B frame sizes (ie, no jumbo frames) over a BCM5701 on an 850 MHz PIII with FreeBSD 4.3 and similar rates from em0 on a 2GHz P4 with 4.6. Both results were with 1500B frames and considerable free CPU (50% on the 850MHz PIII). However, given that they were full 1500B frames (99%), at least in one direction, perhaps that does not count. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], http://www.richardsharpe.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: max phy mem known working with FreeBSD 4.x
On Tue, 5 Nov 2002, Terry Lambert wrote: Matt wrote: Anyone knows the max physical mem that can be used with FreeBSD4.3? [ ... ] any chance going more than 4G? Sure, if you want to install it to warm things up. No, if you want to access it; access is limited to 4G, because that's 32 bits of address space, and your machine is a 32 bit machine. Well, the P4 does have an address extension that allows addressing of up to 64GB, EPA or something like that. However, it requires some work. Would be nice in large Samba servers, though, to be able to cache enormous amounts of file data :-) Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], http://www.richardsharpe.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: max phy mem known working with FreeBSD 4.x
On Mon, 4 Nov 2002, Matt wrote: Anyone knows the max physical mem that can be used with FreeBSD4.3? Well, we have 4GB in a 4.6.2 system, and I think that we ran 4.3 on those systems for a while. However, you lose anywhere between 128M and 512M because of the PCI address space. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], http://www.richardsharpe.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Disk reliability (was: Tagged Command Queuing or Larger Cache?)
On Wed, 30 Oct 2002, Greg 'groggy' Lehey wrote: On Tuesday, 29 October 2002 at 2:03:50 +, Daniel O'Connor wrote: On Tue, 2002-10-29 at 01:54, Kenneth Culver wrote: I haven't had any trouble with the WDxxxBB drives - the WDxxxAA drives are pretty unreliable though. Hrmm, I havn't tried those, but just about every WD drive I've used has ended up with problems which were of course handled by the warranty, but even then, I still had to reinstall the os and pull a bunch of stuff from my backups which was a pain to do for each failure. Like I said, just my personal experience. I don't think the new 8MB cache drives have been out long enough to actually develop the problems I've seen on WD drives though. Yes, but my point is that the AA drives are bad, but the BB drives seem good. I have been using them for a while (~1 year) without trouble. I've had trouble with BB drives. Given that they have (or had) a 3 year warranty, 1 year of experience isn't very much to go by. Personally I find that no HD manufacturer has a good reputation - they have all made trashy drives at one point. Give the general time it takes for problems to surface vs product lifetimes makes deciding what to buy a PITA :( That's a more valid point. Note that WD and Seagate have dropped their warranty on IDE drives from 3 years to 1 year. What does this say to you? Hmmm, from what I remember, they did that for the 5400RPM drives, not the 7200RPM drives! Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], http://www.richardsharpe.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: *nixopt, an idea
On Thu, 29 Aug 2002, Koul-Henning Pamp wrote: Fellow hackers, I'm creating a new unix optimizer tool, I've finished version 0.1 and need feedback. At the risk of feeding the feeble Troll :-) You would think that if this person has that much time on their hands, they would actually go find a project that needs developers and do something productive, like, say, Linux, or ... (list of about 1,000 projects elided). Them that can, do. Them that can't, talk about it. Sigh. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Name service switch
On Wed, 24 Jul 2002, Terry Lambert wrote: Richard Sharpe wrote: Hmmm, so what you are telling me is that winbindd will not work on FreeBSD even under 5.0? What part of FreeBSD 5.x implements NSS, but does not implement IRS was unclear? 8-). The winbindd program is an NSS interfaced program; therefore it should work *fine* in 5.x, since NSS supposedly works fine in 5.x. Well, you were saying that DSOs were unsafe or some such, so I assumed that FreeBSD 5.x's NSS did not support DSOs, and thus windindd. I guess I should just try it. -- Terry -- Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Name service switch
On Wed, 24 Jul 2002, Terry Lambert wrote: Paul Khavkine wrote: Well the one we have in -CURRENT lacks dynamic module support (as does IRS). I just wanted to know if there was any issues for not implementing IRS before ? The BIND IRS implementation depends on use of the BIND resolver library. In FreeBSD, the resolver library is integrated into libc, so upgrading it is very, very hard compared to what it would be if it were boken out into a seperate libresolv. The use of loadable modules has two problems; the first is that it requires that binaries be dynamically, not statically linked, because FreeBSD does not support a static libdlopen because of how symbol lookups are wedged for things like a NULL parameter, and importing of main object symbols by loaded modules (in fact, the ELF standard was never intended to support static linking), and some programs can not be dynamically linked (anything run before /usr is mounted to get lib and libexec). The second is that dynamic linking and modules themselves open you up to security exploits based on inherent flaws in the idea in a hostile implementation environment. Hmmm, so what you are telling me is that winbindd will not work on FreeBSD even under 5.0? If you want to find an IRS patch set for FreeBSD, serach for the terms irs nss ldap, and it will be in the top 5 or so. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message -- Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: tuning for samba
On Sun, 14 Jul 2002, Doug Barton wrote: Chad David wrote: So, I'm building a new box tonight and was wondering if anybody has any tried and true tuning parameters for samba on -stable. Since you never got any actual answers to your question, I offer the following. The only samba tuning option I've ever seen make a difference is enabling socket options = TCP_NODELAY. Also, make sure that newreno Default: socket options = TCP_NODELAY is turned off on the samba host. As for nfs mount options, I have found that -3cisl works best for me, when the servers are sun, or netapp boxes. How does turning off newreno help? We think we are seeing fast retransmit get confused in the presence of dropped packets. Is this possibly related? Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: tuning for samba
On Wed, 10 Jul 2002, Darren Pilgrim wrote: Chad David wrote: A local company has been having issues with samba for some time (it kills an e250, and has seriously stressed an e5000) and I've been telling the admin (half seriously) that he should just toss it on a PC with FreeBSD. Well they finally got tired of hearing FreeBSD this and FreeBSD that and asked me to bring a box in if I was so confident... tomorrow morning at 9am. So, I'm building a new box tonight and was wondering if anybody has any tried and true tuning parameters for samba on -stable. They currently have ~700 users attached. The load per user is pretty low but just rebooting and handling the reconnects has killed small boxes. As a side note, the data being served will be attached to the samba server via NFS. The one thing I've seen kill a box besides the reboot-reconnect blast is content searches by the Windows Find dialog. All it takes is one user on a fast machine and network link doing the Windows equivalent of find / -name * -exec grep foo \{\} \; to run you out of file descriptors in a matter of seconds. Yes, Samba has to do readdir scans to simulate a case-insensitive file system on a case-sensitive file system. Samba uses a seperate process for each connection, and Windows opens one connection per share. Yes to the first claim, no to the second. Most definitely not. For a single client, windows puts all share access (net use, mounting, whatever you want to call it) over the single TCP connection to the server. The only time Windows will create a new connection is if you have given the server multiple NetBIOS names, and you use different NetBIOS names to access the share. For example, even if the NetBIOS names NB1 and NB2 translate to the same IP (10.10.10.10), if you do the following: net use f: \\nb1\share1 net use f: \\nb2\share1 the client will establish two different connections. However, that is the only way I know to get multiple connections from a client to a server. Even Terminal Server multiplexes multiple users over the one TCP connection. Most Windows users only work on one share at a time, so with two open shares on ~700 machines that means ~1400 connections with roughly half of them idle. That's a lot of freeable RAM should you suddenly need it. Nope, ~700 connections! Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: tuning for samba
On Thu, 11 Jul 2002, Darren Pilgrim wrote: Richard Sharpe wrote: On Wed, 10 Jul 2002, Darren Pilgrim wrote: Samba uses a seperate process for each connection, and Windows opens one connection per share. Yes to the first claim, no to the second. Most definitely not. For a single client, windows puts all share access (net use, mounting, whatever you want to call it) over the single TCP connection to the server. You're right, sorry. I had gotten mixed up on the multiple connection issue because of my own configuration that results in one share per connection. Nope, ~700 connections! Even with just one connection per machine, though, you're still going to have a significant amount of swappable memory in idle smbd processes. Yes, I agree. Something that I would like to do more about by making sure that as much as possible is shared. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: tuning for samba
On Wed, 10 Jul 2002, Chad David wrote: A local company has been having issues with samba for some time (it kills an e250, and has seriously stressed an e5000) and I've been telling the admin (half seriously) that he should just toss it on a PC with FreeBSD. Well they finally got tired of hearing FreeBSD this and FreeBSD that and asked me to bring a box in if I was so confident... tomorrow morning at 9am. So, I'm building a new box tonight and was wondering if anybody has any tried and true tuning parameters for samba on -stable. They currently have ~700 users attached. The load per user is pretty low but just rebooting and handling the reconnects has killed small boxes. As others have said, memory is an issue. In some 'benchmark' testing, I have noticed that FreeBSD holds up pretty well to large numbers of connects coming in at one time, say compared to Linux. Starting up 100 clients during about two or three seconds (as long as it takes to fork 100 processes on the driver) does not kill a FreeBSD Samba server as much as it does a Linux server running Linux 2.4.x. Certainly, a 2GB machine that I regularly test against does not notice the smbds start up all that much. As a side note, the data being served will be attached to the samba server via NFS. Hmmm, some of the locking stuff might be an issue then ... Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Memory layout tools ... to tweak Samba's memory footprint
Hi, I am looking for tools that will help me find data-structures in Samba that should be in shared pages but aren't because they are not marked appropriately by the source code. Are there any tools that will allow me to break out the layout of an executable file, but also that will allow me to profile memory usage? From the sound of the various discussions I have heard about FreeBSD's VM, since Samba forks a copy of smbd for each connection, any pages that have not been written by children smbd's will be shared in anycase, so maybe there is not so much to worry about? Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: tuning for samba
On Wed, 10 Jul 2002, Chad David wrote: On Thu, Jul 11, 2002 at 11:20:51AM +0930, Richard Sharpe wrote: Certainly, a 2GB machine that I regularly test against does not notice the smbds start up all that much. I have no real way of testing this type of load here, but first thing tomorrow morning I'll know.. Up on samba.org in CVS under cifs-load-gen is a tool that can simulate clients. Simulating the startup of 100's of clients and then watching what happens to the server is not too hard, as long as you have a driver that can withstand the load of that many driver processes starting :-) As a side note, the data being served will be attached to the samba server via NFS. Hmmm, some of the locking stuff might be an issue then ... This is my biggest concern. I just don't know what to tune here since the data just basically passes straight through the box, and the with about of data being served and the access patterns buffering is pointless. One thing I failed to mention, none of the clients ever write; the system is completely read only. Thanks. -- Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Adding readdir entries to the name cache ...
On Thu, 4 Jul 2002, Terry Lambert wrote: [major snippage, much useful to think about here ...] However, ls seems to call lstat in the same order that the files are in the directory, and a normal clock approach to directories would yield exactly the same result. Further, in the cases that the user did not want a -l, we would avoid adding many potentially useless names to the name cache and reducing its performance. This is because the sort occurs first. An unsorted ls (which is available -- see the man page) doesn't have this issue. I don't want to start a flame war, but a truss of ls -l shows the following: getdirentries(0x5,0x809d000,0x1000,0x80990b4)= 4096 (0x1000) lstat(.gnome,0x809c248)= 0 (0x0) lstat(.mc,0x809c348) = 0 (0x0) lstat(.xinitrc,0x809c44c) = 0 (0x0) lstat(750B.pdf,0x809c54c) = 0 (0x0) lstat(Mail,0x809c648) = 0 (0x0) lstat(nsmail,0x809c748)= 0 (0x0) lstat(.cshrc,0x809c848)= 0 (0x0) lstat(.ssh,0x809c948) = 0 (0x0) lstat(.gnome_private,0x809ca50)= 0 (0x0) lstat(.xchat,0x809cb48)= 0 (0x0) lstat(.exmh,0x809cc48) = 0 (0x0) lstat(.ICEauthority,0x809cd50) = 0 (0x0) lstat(.netrc,0x809ce48)= 0 (0x0) This is the same order that 'ls -fal' produced. This suggests that the ls is doing an unsorted lookup of the info, and then sorting. That is the way I would have done it as well. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Adding readdir entries to the name cache ...
On Thu, 4 Jul 2002, Terry Lambert wrote: Richard Sharpe wrote: Note that you can get another 8-12% by making negative cache entries, since DOS/Windows clients tend to search the full path each time, even though they have already received a successful response in the past (i.e. there is no client caching for this information, because there is no distributed coherency protocol that would permit server invalidation of the cache). Unmodified, SVR4 DNLC can not support negative cache entries (there need to be two line changes). Hmmm, I think that the major part of the problem there was that, for what ever reason, Barry Feigenbaum of IBM, declined to add a Change Working Directory or Set Working Diretory command to the SMB protocol. Thus, at least for the SMB protocol, and maybe generally, Windows clients must always send the full pathname for every file they want, unless it happens to be at the root of the share. Perhaps I am wrong about that. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Case independent file name searched (was Re: Adding readdir entriesto the name cache ...)
On Thu, 4 Jul 2002, Terry Lambert wrote: Richard Sharpe wrote: [1] Samba, because it has to support the Windows case insensitive file system, must do some pretty ugly things :-) When a client asks that a file be opened, for example, Samba tries with exactly the case that was presented. If that fails, it must do a readdir scan of the directory so it can do a case-insensitive match. So, even negative caching does not buy us much in the the case of Samba. What would help is a case insensitive filesystem. It is useful to be able to do case sensitive on storage, case insensitive on lookup on a per process basis. The easiest is if you wire this in as a flag on the proc itself. The normal way this is done is a flag to sfork, but... it should also be possible to have the proc open itself in procfs, and then ioctl() down a flag setting for this. I have come to the conslusion that Terry is right. Having watched a cygwin-based build of a package, the behavio[u]r is just too ugly. When an include file is looked for, it causes Samba to do readdir scans for every directory in the -I chain that the include file is not in until it is found. If we could eliminate all those readdir scans performance would improve dramatically. Fundamentally, what I want to support is both UNIX clients (say, via NFS etc) and Windows clients to be able to share files in the same directory. Samba already does case-preserving file name creation, and indeed, the problem does not go away even if Samba always case-folds all names to lower case, because a UNIX-user or -client might still create two files that differ only by the case of one or more characters in their names. This means that Terry is right when he says I need an IOCTL. Basically, normal users get the normal case sensitive file system, while Windows clients, via an IOCTL which says, give ME case-independed lookups, get a slightly different file system. To support that, however, I need to change the name cache hash function to be case-insensitive (there's more--see below). This means that name cache hash chains could get longer. In the worst case, if a file system contains large numbers of files with long names, all using the same characters that only differ by case of indivual characters, the hash chain becomes a linear search. However, UNIX file systems generally don't get like that. I imagine that the hash chains will grow to no more that twice their current size, but will probably grow by a factor close to one. Another problem is the extra complexity required in cache_lookup. When we want cache-insensitive lookups, we have to do extra work, even if we find a match in the cache. The problem is with files that differ by only the case of one or more characters. When this occurs, my view is that we should return the file with the longest string of exactly matching characters, however, we might allow the sys admin to set policy, at the expense of complicating things. When we search a hash chain, if we get an exact match, we are done, but if we don't get an exact match, we still have to do a readdir scan to find a better match, and to ensure that we return consistent results. Similarly, when we do a readdir scan, if we get an exact match, we are done, but if we don't, we need to keep going. Another aspect that needs consideration is the effect on negative caching. Getting a negative result on the exact name match is no good anylonger, since there may be a case-insensitive match in the directory. This seems to make negative cache entries useless for case-insensitive matching. Finally, I think that persuing this subject some more is very important from the point of view of constructing high-performance CIFS servers, based on Samba or other software, so I would appreciate comments. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Adding readdir entries to the name cache ...
On Fri, 5 Jul 2002, Garance A Drosihn wrote: At 6:29 AM +0930 7/6/02, Richard Sharpe wrote: Hmmm, I think that the major part of the problem there was that, for what ever reason, Barry Feigenbaum of IBM, declined to add a Change Working Directory or Set Working Diretory command to the SMB protocol. Thus, at least for the SMB protocol, and maybe generally, Windows clients must always send the full pathname for every file they want, unless it happens to be at the root of the share. Could the unix process for samba fake that? Keep track of the most recently used directory, and when a new request comes in split it into directory plus filename, and if the directory is the same as the previous one, then just access the filename. If the directory is different, try to do a chdir() to the new directory, and if that succeeds then save that as the previous directory. Yes it can do that, and should do that. I will have to check what Samba does. I know I proposed adding a path cache to smbclient/smbtar so that it could avoid repeatedly, and even a cache of one path could make a big difference. Or is that more trouble than it's worth? No, I think it is worth a lot. I suspect Samba already does that. There is just so much code to look at. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: broadcast packets not reaching sender on if_hwassist capableinterfaces
On Fri, 31 May 2002, mark tinguely wrote: Summary: FreeBSD 4.3 Broadcast IP datagrams not looping on Broadcom GigE card. Also, why is the decision to loop the packet back made in ether_output rather than in ip_output? Off the top of my head I can't see any particular advantage, but perhaps there is. The disadvantage is that I will have to disable hardware assist for broadcast packets to make things work right. the bge sets the interface output to be ether_output() which is called from the ip_output() after all thse flags have been set.. Two things I forgot to mention: 1. The driver we are using is a Broadcom propietary driver, not the BGE driver. 2. Our driver is mistakenly setting the device as being in SIMPLEX mode :-) And, I realized in the shower this morning, if you have to calculate the checksums on-board for any reason for that packet, there is no point in having the checksums calculated twice. Are these packets that fail part of a IP fragment? m_copy does not copy the hwassist flags. we see the same problem with multicast packets. Bill Fenner wrote a simular multicast patch before FreeBSD 4.5-RELEASE, but even that fix has not been included into the tree yet either. In ip_output, we should either we need to copy the hwassist flag in _copy or in the code that follows a m_copy, or force the checksum calculations. Yes. We are investigating fixing our driver to make sure that it sets DUPLEX mode since our switch is capable of it. if your packets are not part of a fragment, then let me know. Nope, but thanks for your reply. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
broadcast packets not reaching sender on if_hwassist capable interfaces
Hi, I was chasing an interesting problem in our FreeBSD 4.3 codebase today. Broadcast IP datagrams were not being received by programs on the same system the IP datagrams were sent from. They were making it out the wire. The were being sent on a Broadcom GigE interface that has hardware checksumming. After a while we realized that what was happening was that the packets were being marked for hardware checksumming in ip_output (m-m_pkthdr.csum_flags), but the decision about looping a packet back to ourselves is made in ether_output, so the packet is never checksummed and is discarded in ip_input. Has this been fixed in a subsequent release? Also, why is the decision to loop the packet back made in ether_output rather than in ip_output? Off the top of my head I can't see any particular advantage, but perhaps there is. The disadvantage is that I will have to disable hardware assist for broadcast packets to make things work right. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: File locking, closes and performance in a distributed filesystemenv
On Tue, 14 May 2002, Terry Lambert wrote: Richard Sharpe wrote: Hmmm, I wasn't very clear ... What I am proposing is a 'simple' fix that simply changes p-p_flag |= P_ADVLOCK; to fp-l_flag |= P_ADVLOCK; And never resets it, and then in closef, if ((fp-l_flag P_ADVLOCK) fp-f_type == DTYPE_VNODE) { lf.l_whence = SEEK_SET; lf.l_start = 0; lf.l_len = 0; lf.l_type = F_UNLCK; vp = (struct vnode *)fp-f_data; (void) VOP_ADVLOCK(vp, (caddr_t)p-p_leader, F_UNLCK, lf, F_POSIX); } Which still means that the correct functionality is implemented, but we only try to unlock files that have ever been locked before, or where we are sharing a file struct with another (related) process and one of them has locked the file. Do you expect to share the same fp between multiple open instances for a given file within a single process? I think your approach will fail to implement proper POSIX file locking semantics. I really hate POSIX semantics, but you have to implement them exactly (at least by default), because programs are written to expect them. Basically, this means that if you open a file twice, lock it via the first fd, then close the second fd, all locks are released. In your code, it looks like what would happen is that when you closed the second fd, the fp-l_flag won't have the bit set. Correct me if I'm wrong? The reason for the extra overhead now is that you can't do this on an open instance basis because of POSIX, so it does it on a process instance basis. OK, you have convinced me. I have looked at the POSIX spec in this area, and agree that I can't do what I want to do. The only other alternative is to do it on a vp basis -- and since multiple fp's can point to the same vp, your option #2 will fail, as described above, but my suggestion to do the locking locally will associate it the the vp (or the v_data, depending on which version of FreeBSD, and where the VOP_ADVLOCK hangs the lock list off of: the vnode or the inode) will maintain the proper semantics. Your intent isn't really to avoid the VOP_ADVLOCK call, it's to avoid making an RPC call to satisfy the VOP_ADVLOCK call, right? Yes, correct. We will have to do it in the vnode layer as you suggest. Currently we are using 4.3 and moving to 4.5, so we will have to figure out the differences. You can't really avoid *all* the avoidable overhead, without restructuring the VOP_ADVLOCK interface, which is politically difficult. I wouldn't want to try. Too much code to change and too much chance of a massive screw-up. Thanks for perservering with me. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
File locking, closes and performance in a distributed file systemenv
Hi, I might be way off base here, but we have run into what looks like a performance issue with locking and file closes. We have implemented a distributed file system and were looking at some performace issues. At the moment, if a process locks a file, the code in kern_descrip.c that handles it does the following: p-p_flag |= P_ADVLOCK; to indicate that files might be locked. Later in closef, we see the following: if (p (p-p_flag P_ADVLOCK) fp-f_type == DTYPE_VNODE) { lf.l_whence = SEEK_SET; lf.l_start = 0; lf.l_len = 0; lf.l_type = F_UNLCK; vp = (struct vnode *)fp-f_data; (void) VOP_ADVLOCK(vp, (caddr_t)p-p_leader, F_UNLCK, lf, F_POSIX); } This seems to mean that once a process locks a file, every close after that will pay the penalty of calling the underlying vnode unlock call. In a distributed file system, with a simple implementation, that could be an RPC to the lock manager to implement. Now, there seems to be a few ways to migitate this: 1. Keep (more) state at the vnode layer that allows us to not issue a network traversing unlock if the file was not locked. This means that any process that has opened the file will have to issue the network traversing unlock request once the flag is set on the vnode. 2. Place a flag in the struct file structure that keeps the state of any locks on the file. This means that any processes that share the struct (those related by fork) will need to issue unlock requests if one of them locks the file. 3. Change a file descriptor table that hangs off the process structure so that it includes state about whether or not this process has locked the file. It seems that each of these reduces the performance penalty that processes that might be sharing the file, but which have not locked the file, might have to pay. Option 2 looks easy. Are there any comments? Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: File locking, closes and performance in a distributed filesystemenv
On Tue, 14 May 2002, Terry Lambert wrote: Hmmm, I wasn't very clear ... What I am proposing is a 'simple' fix that simply changes p-p_flag |= P_ADVLOCK; to fp-l_flag |= P_ADVLOCK; And never resets it, and then in closef, if ((fp-l_flag P_ADVLOCK) fp-f_type == DTYPE_VNODE) { lf.l_whence = SEEK_SET; lf.l_start = 0; lf.l_len = 0; lf.l_type = F_UNLCK; vp = (struct vnode *)fp-f_data; (void) VOP_ADVLOCK(vp, (caddr_t)p-p_leader, F_UNLCK, lf, F_POSIX); } Which still means that the correct functionality is implemented, but we only try to unlock files that have ever been locked before, or where we are sharing a file struct with another (related) process and one of them has locked the file. Richard Sharpe wrote: I might be way off base here, but we have run into what looks like a performance issue with locking and file closes. [ ... ] This seems to mean that once a process locks a file, every close after that will pay the penalty of calling the underlying vnode unlock call. In a distributed file system, with a simple implementation, that could be an RPC to the lock manager to implement. Yes. This is pretty much required by the POSIX locking semantics, which require that the first close remove all locks. Unfortunately, you can't know on a per process basis that there are no locks remaining on *any* vnode for a given process, so the overhead is sticky. Now, there seems to be a few ways to migitate this: 1. Keep (more) state at the vnode layer that allows us to not issue a network traversing unlock if the file was not locked. This means that any process that has opened the file will have to issue the network traversing unlock request once the flag is set on the vnode. 2. Place a flag in the struct file structure that keeps the state of any locks on the file. This means that any processes that share the struct (those related by fork) will need to issue unlock requests if one of them locks the file. 3. Change a file descriptor table that hangs off the process structure so that it includes state about whether or not this process has locked the file. It seems that each of these reduces the performance penalty that processes that might be sharing the file, but which have not locked the file, might have to pay. Option 2 looks easy. Are there any comments? #3 is really unreasonable. It implies non-coelescing. I know that CIFS requires this, and so does NFSv4, so it's not an unreasonable thing do do eventually (historical behaviour can be maintained by removing all locks in the overlap region on an unlock, yielding logical coelescing). The amount of things that will need to be touched by this, though, means it's probably not worth doing now. In reality, for remote FS's, you want to assert the lock locally before transitting the network anyway, in case there is a local conflict, in which case you avoid propagating the request over the network. For union mounts of local and remote FS's, for which there is a local lock against the local FS by another process that doesn't respect the union (a legitimate thing to have happen), it's actually a requirement, since the remote system may promote or coelesce locks, and that means that there is no reverse process for a remote success followed by a local failure. This is basically a twist on #1: a)Assert the lock locally before asserting it remotely; if the assertion fails, then you have avoided a network operation which is doomed to failure (the RPC call you are trying to avoid is similar). b)When unlocking, verify that the lock exists locally before attempting to deassert it remotely. This means there there is still the same local overhead as there always was, but at least you avoid the RPC in the case where there are no outstanding locks that will be cleared by the call. I've actually wanted the VOP_ADVLOCK to be veto-based for going on 6 years now, to avoid precisely the type of problems your are now facing. If the upper layer code did local assertion on vnodes, and called the lower layer code only in the success cases, then the implementation would actually be done for you already. -- Terry -- Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Broadcom BCM5701 Chipset problems
On Mon, 13 May 2002, David Greenman-Lawrence wrote: David Greenman-Lawrence wrote: If you aren't using VLAN tagging, you shouldn't care. No, that is absolutely not correct. The checksum problems happend in many situations, depending on the chipset and other factors. The problem that resulted in the commit to disable the receive hardware checksum was caused by small packets with certain byte patterns, NOT VLAN ENCAPSULATION. Are you sure you are talking about the Tigon III, and not the Tigon II? Yes, of course. I'm talking specifically about the Broadcom BCM570x. My particular experiance was with the Syskonnect 9D21 and 9D41 boards which both use the Altima chip. I have seen checksum problems with the 5700 ... Can you tell me which steppings of the 5701 you are seeing the problems with? Is it with 1500-byte frames, jumbo frames, or both? Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: sendfile() in tftpd?
On Tue, 23 Apr 2002, Attila Nagy wrote: Hello, No, sendfile() is only for TCP connections, TFTP is using UDP. If you want performance, use something else. It's even in the manpage: Sendfile() sends a regular file specified by descriptor fd out a stream socket specified by descriptor s. Silly me. BTW, I can't use anything else. Are there any alternatives to TFTP for booting machines off the network? (using standard, PC components) Multicast! BootIX (nee InCom) have support for this in their BootROMS. it might not be hard to hack into Etherboot et al. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Is there anyway from userspace to get the 'enum vtagtype v_tag' for a vnode for an open file?
Hi, I want to find out the file system type from userspace for an open file. Can I get at this info? the stat call does not give it to me. -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Is there anyway from userspace to get the 'enum vtagtype v_tag' for a vnode for an open file?
Richard Sharpe wrote: Hi, I want to find out the file system type from userspace for an open file. Can I get at this info? the stat call does not give it to me. Hmmm, getfsspec seems to fill the need. Sorry for the noise. -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Adding si_fd to struct __siginfo ...
Hi, One of my tasks is to add oplock support to FreeBSD so that we (Panasas) can allow correct caching of files by Windows clients in the presence of NFS clients using the same files. We have a preliminary implementation, based on the Linux implementation, but it is a gross hack because there is no way for the kernel, when it delivers a signal, to indicate the fd that caused delivery of the signal. Linux and Solaris have an fd field in struct siginfo_t which allows the kernel to indicate, for signals relating to files, to indicate which fd the signal relates to. I notice that in FreeBSD struct siginfo_t seems to have int __spare__[7]; and would like to use one of those spare fields as si_fd. While I can do that in our code base, if I want to contribute the OpLock code it would be useful if the FreeBSD community finds this change agreeable. Are there any counter suggestions or any big objections? -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Adding si_fd to struct __siginfo ...
Alfred Perlstein wrote: * Richard Sharpe [EMAIL PROTECTED] [011221 15:11] wrote: Hi, One of my tasks is to add oplock support to FreeBSD so that we (Panasas) can allow correct caching of files by Windows clients in the presence of NFS clients using the same files. We have a preliminary implementation, based on the Linux implementation, but it is a gross hack because there is no way for the kernel, when it delivers a signal, to indicate the fd that caused delivery of the signal. Linux and Solaris have an fd field in struct siginfo_t which allows the kernel to indicate, for signals relating to files, to indicate which fd the signal relates to. I notice that in FreeBSD struct siginfo_t seems to have int __spare__[7]; and would like to use one of those spare fields as si_fd. While I can do that in our code base, if I want to contribute the OpLock code it would be useful if the FreeBSD community finds this change agreeable. Are there any counter suggestions or any big objections? There was already a big mess of a discussion about how this would be much better done via kqueue than with realtime signals. I guess if you can get a working implementation that is compatible with the existing interfaces it would work, however it's a _much_ better idea to use kqueue to deliver this sort of notification. And yes, it has been discussed in the lists already. OK, I will go and look at the discussion ... -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Adding si_fd to struct __siginfo ...
Mike Barcroft wrote: Richard Sharpe [EMAIL PROTECTED] writes: There was already a big mess of a discussion about how this would be much better done via kqueue than with realtime signals. I guess if you can get a working implementation that is compatible with the existing interfaces it would work, however it's a _much_ better idea to use kqueue to deliver this sort of notification. Well, it turns out that there are two problems with what I suggested: 1, signals are lossy, in that if multiple signals occur, only one might be delivered; and 2, there is no place to store any signal-related information in the kernel, in any case. So, it seems like kqueue is really the only game in town, but I will need an appropriate filter, and it would be nice if I could get some sort of async notification that there were events ready to be processed, as I really don't want to rewrite Samba completely, just to support kqueue ... Hmmm, perhaps the approach should be to signal that leases/oplocks have been broken, but provide the details via kqueue. And yes, it has been discussed in the lists already. OK, I will go and look at the discussion ... Unfortunately this discussion mistakenly took place on a FreeBSD mailing list intended for administrative-only issues, so it isn't publicly available on our end. Luckily, a Samba mailing list was on the CC line. You should be able to find it on the [EMAIL PROTECTED] archives circa September 2001. Best regards, Mike Barcroft To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Does anyone know if the Broadcom BCM5700 has problems with HW csum?
David Greenman wrote: I am playing with a driver for the Broadcom 5700/5701. It recognizes the 5700 in my 3Com cards OK, but seems to screw up the TCP checksum. Switching off hardware checksum capability fixes it. Does anyone know the details of which stepping this stuff worked on? I haven't nailed down the problem that I've seen with them to a specific chipset, but I can confirm that they incorrectly calculate the checksum on input packets in some cases. It seems to be related to both packet size and certain TCP options (or lack of them). I've only seen the problem occur with very small (0-4 byte payload) packets. In any case, after discussing this problem with Bill Paul, I disabled input checksum in the -current driver and intend to merge that to -stable in a few days. OK, that makes sense, because I wasn't getting past first base. SYN ACK segments werre being rejected with bad checksum. The driver I modified is actually for the 5701, which works fine with all checksum offloading enabled. I will try to disable just receive TCP checksum and see what happens. -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Does anyone know if the Broadcom BCM5700 has problems with HW csum?
Hi, I am playing with a driver for the Broadcom 5700/5701. It recognizes the 5700 in my 3Com cards OK, but seems to screw up the TCP checksum. Switching off hardware checksum capability fixes it. Does anyone know the details of which stepping this stuff worked on? -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
FreeBSD might not have been slower that Linux in the real world ...
Hi, As you might have seen, there was a problem in the FreeBSD code that Matt Dillon fixed recently. This problem involved the flag TCP_NODELAY not being propogated across an accept() call. This resulted in tbench runs turning in very poor performance under FreeBSD compared with Linux. However, in the real world, ie in Samba, this might not have been a problem at all. I believe, but will not be able to check for a little while now, that Samba was doing the setsockopt() call after the accept() call, and indeed, after the fork() call when a new smbd is fork'd to handle the new connection. Since TCP_NODELAY is the default, Samba under FreeBSD was probably always getting the benefit of that 68Mb/s that it seems possible to get using the SMB protocol on a 100Mb/s link. However, it is good that FreeBSD also gets good numbers under the benchmarks. -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Patch #3 (TCP / Linux / Performance)
OK Matt, that last patch did the trick. I am now getting 68 and 69Mb/s between my Linux system and the FreeBSD system. I have also tried the loopback interface, and I am getting 371Mb/s for 1 process, dropping to about 320Mb/s for 5. This seems like it is close to the limit for the machine I am using, as CPU hits 100% when I ran the above tbench runs. I will have to try it with Gigabit Ethernet, but won't be able to do so until next week or the week after (after I get to the US). Does the FreeBSD tcp stack do zero copy (page flip the data to userspace)? In the localhost case, it seems like there are two copies to/from userspace there. -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Samba performance compared between FreeBSD and Linux ...
Hi, It seems like all of the issues uncovered have been fixed, so it seems like you cannot use performance as a way to choose between FreeBSD and Linux any longer. I will re-issue my report, but I do not have any more time to spend on this now for several days. I will most likely re-run the tests when I get to the US later this week, and would hope to re-issue the report the week after next. -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Found the problem, w/patch (was Re: FreeBSD performing worse than Linux?)
Matthew Dillon wrote: Index: tcp_output.c === RCS file: /home/ncvs/src/sys/netinet/tcp_output.c,v retrieving revision 1.39.2.10 diff -u -r1.39.2.10 tcp_output.c --- tcp_output.c 2001/07/07 04:30:38 1.39.2.10 +++ tcp_output.c 2001/11/30 21:18:10 @@ -912,7 +912,14 @@ tp-t_flags = ~TF_ACKNOW; if (tcp_delack_enabled) callout_stop(tp-tt_delack); +#if 0 + /* + * This completely breaks TCP if newreno is turned on + */ if (sendalot (!tcp_do_newreno || --maxburst)) + goto again; +#endif + if (sendalot) goto again; return (0); } To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message OK, I have applied this patch, and FreeBSD 4.4-STABLE now seems to behave approximately the same as Linux. There are no extra ACKs, and FreeBSD now coalesces pairs of ACKs. However, performance for one client is still at 25Mb/s with the tbench run, while Linux provides around 68Mb/s. So, it is back to staring at traces. Perhaps I will get a full trace now. -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: UDMA33 and SiS5591 on FreeBSD 4.4-RELEASE
Greg Lehey wrote: On Saturday, 1 December 2001 at 13:05:53 +0100, Søren Schmidt wrote: It seems Zwane Mwaikambo wrote: Hi, I've got a box which boots up with UDMA33 but during the boot sequence gets write problems and ends up disabling it and i presume falling back to PIO4. I've tested the same box on Linux 2.4.2+ and have had no problems running it at UDMA33. Host: SiS 5591 (revision?) Disk: Seagate 3.2G ATA2 Ohhh, I need alot more info before I can tell whats going on.. I need at least the dmesg from a verbosely booted system and also a pciconf -l to tell what chips you have. Note that there are other chips out there which return the same PCI information but which appear to be capable of ATA 100. I recently gave a patch to Richard Sharpe (copied) which he says was able to get his SiS 5591 to run at ATA 100. I'm still waiting for feedback from him before forwarding it to you. I also have a machine with a SiS 5591 which can't go beyond ATA 33. Here are the pciconf outputs for each chip: Mine (ATA 33): atapci0@pci0:0:1: class=0x01018a card=0x chip=0x55131039 rev=0xd0 hdr=0x00 Richard's (ATA 100): pci0:0:1 Class=0x010180 card=0x55131039 chip=0x55131039 red=0xd0 hdr=0x00 Dwayne's: atapci0@pci0:0:1: class=0x010180 card=0x55131039 chip=0x55131039 rev=0xd0 hdr=0x00 I don't understand why Richard's output is missing the atapci@ at the beginning. I believe he was using 4.3-RELEASE at this point; mine was from 4-STABLE of May this year. Here is what pciconf gives me: atapci0@pci0:0:1: class=0x010180 card=0x55131039 chip=0x55131039 rev=0xd0 hdr=0x00 Attached is the patch I am using, which is based on what Greg gave me. It tries UDMA5 first, and steps down ... -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba --- ata-dma.c.orig Wed Oct 31 07:29:52 2001 +++ ata-dma.c Fri Nov 30 14:38:52 2001 @@ -519,30 +519,61 @@ break; case 0x55131039: /* SiS 5591 */ - if (udmamode = 2 pci_get_revid(parent) 0xc1) { - error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0, - ATA_UDMA2, ATA_C_F_SETXFER, ATA_WAIT_READY); - if (bootverbose) - ata_printf(scp, device, - %s setting UDMA2 on SiS chip\n, - (error) ? failed : success); - if (!error) { - pci_write_config(parent, 0x40 + (devno 1), 0xa301, 2); - scp-mode[ATA_DEV(device)] = ATA_UDMA2; - return; + if (bootverbose) + printf (SiS 5513/5591, udmamode %d\n, udmamode); + if (pci_get_revid(parent) 0xc1) { + udmamode = 5; /* Force it to 100 */ + if (udmamode = 5) {/* Claims UDMA 100 */ + error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0, + ATA_UDMA5, ATA_C_F_SETXFER, ATA_WAIT_READY); + if (bootverbose) + ata_printf(scp, device, + %s setting UDMA5 on SiS chip\n, + (error) ? failed : success); + if (!error) { + pci_write_config(parent, 0x40 + (devno 1), 0xa301, 2); + scp-mode[ATA_DEV(device)] = ATA_UDMA5; + return; + } } - } - if (wdmamode =2 apiomode = 4) { - error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0, - ATA_WDMA2, ATA_C_F_SETXFER, ATA_WAIT_READY); - if (bootverbose) - ata_printf(scp, device, - %s setting WDMA2 on SiS chip\n, - (error) ? failed : success); - if (!error) { - pci_write_config(parent, 0x40 + (devno 1), 0x0301, 2); - scp-mode[ATA_DEV(device)] = ATA_WDMA2; - return; + if (udmamode = 4) {/* Claims UDMA 66 */ + error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0, + ATA_UDMA4, ATA_C_F_SETXFER, ATA_WAIT_READY); + if (bootverbose) + ata_printf(scp, device, + %s setting UDMA4 on SiS chip\n, + (error) ? failed : success); + if (!error) { + pci_write_config(parent, 0x40 + (devno 1), 0xa301, 2); + scp-mode[ATA_DEV(device)] = ATA_UDMA4; + return; + } + } + if (udmamode = 2) { + error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0, + ATA_UDMA2, ATA_C_F_SETXFER, ATA_WAIT_READY); + if (bootverbose) + ata_printf(scp, device
Re: FreeBSD performing worse than Linux?
Alfred Perlstein wrote: * Richard Sharpe [EMAIL PROTECTED] [011130 15:02] wrote: The traffic in the tbench case is SMB taffic. Request/response, with a mixture of small requests and responses, and big request/small response or small request/big response, where big is 64K. I have switched off newreno, and it made no difference. I have switched off delayed_ack, and it reduced performance about 5 percent. I have made sure that SO_SNDBUF and SO_RCVBUF were set to 131072 (which seems to be the max), and it increased performance marginally (like about 2%), but consistently. I am still analysing the packet traces I have, but it seems to me that the crucial difference is Linux seems to delay longer before sending ACKs, and thus sends less ACKs. Since the ACK is piggybacked in the response (or the next request), it all works fine, and the reponse/request gets there sooner. However, I have not convinced myself that the saving of 20uS or so per request/response pair accounts for some 40+ Mb/s. Can you try these two commands: sysctl -w net.inet.tcp.recvspace=65536 sysctl -w net.inet.tcp.sendspace=65536 Yes, that is what I did ... -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: FreeBSD performing worse than Linux?
Hi, I think that there are two different problems here. My situation involves a LAN (actually, a crossover cable). I have captured a trace of a 1 client run between the Linux driver and the FreeBSD test system as well as between the Linux driver and the same test system running Linux. I am noticing some interesting things. Linux uses the timestamp option in all the TCP segments I have looked at, so it is sending 12 more bytes per segment that FreeBSD. However, more interesting is that for small messages (less that 1460), FreeBSD does not seem to delay sending ACKs, so we get the following pattern: FREEBSD Driver - Test system: 94 byte IP DG with simulated command Test System - Driver: Ack after 83uS Test System - Driver: Psh Ack after 29uS with 79 total bytes in IP DG LINUX Driver - Test system: 106 byte IP DG with simulated command Test System - Driver: Psh Ack after 89uS with 91 total bytes in IP DG So, as you can see, Linux seems to shave some time off each transaction by avoiding sending extra ACKs. Also, what I am seeing is that neither FreeBSD nor Linux is doing ACK coalescing (if that is possible). While I understand that coalescing ACKs will mess up RTT calculations and SRTT a bit, it would serve to reduce the time taken until responses come back. What I am seeing for large transmits is the following: FreeBSD (Test) Linux (Driver) Request, 1500 bytes including request and some data More segments from the request Some ACKs - About one every two segments Last data segment, usually less that 1500 Lots of ACKs one per segment Usually with large window (ie 16020 when the max window seems to be 16384). Response Less than 1500 Now, I have seen something like 10+ ACKS after the driver has finished sending. They appear to be one per sent segment. Then the FreeBSD system sends its response. The optimal would be for the FreeBSD system to delay the ack until it has data to send, which it probably already has. What I see with the Linux trace is that Linux coalesces ACKs. However, the most I have seen it coalesce is two segments. HTH. -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
A comparison of Samba performance on FreeBSD 4.3-RELEASE and Linux 2.4.13ac4
Hi, attached is a preliminary report on a comparison of Samba performance on FreeBSD 4.3-RELEASE and Linux 2.4.13ac4. I have posted it because I promised to do so, however, I think you should take the numbers with a grain of salt. It demonstrates that overall, for the client tests I did (including up to 100 clients, but not reported), on the same hardware (well pretty much), the two operating systems are comparable. Until I resolve issues around why FreeBSD thinks that my SIS730-equiped PCchips 810MLR board has a UDMA33 controller only and Linux thinks it is capable of UDMA100, the dbench numbers do not mean much. In addition, I continue to look at the tbench numbers to see what the story is with respect to FreeBSD and TCP performance. Perhaps I have done something wrong. Feedback welcome, but I will have limited time to respond over the next week. -- Richard Sharpe, [EMAIL PROTECTED], LPIC-1 www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba in 24 Hours, Special Edition, Using Samba Measuring the performance of Samba under FreeBSD and Linux Richard Sharpe 25-Nov-2001 INTRODUCTION One of the tools available for measuring the performance of Samba and other CIFS servers is NetBench, the ZiffDavis benchmark. This menchmark attempts to simulate the workload applied to a CIFS server by one of more CIFS clients. The Samba team has developed a number of tools that can be used to gain feel for the ability of various parts of a Samba system (the whole system, the file system, or the networking subsystem) to handle the NetBench workload. This workload is based on a network trace. These tools are: smbtorture, which can provide an indication of the ability of a server to handle the NetBench load from one or more clients; dbench, which can provide an indication of the ability of the filesystem on a server to handle the workload offered by one or more clients; and tbench, which can provide an indication of the ability of the networking subsystem to handle the workload offered by one or more NetBench clients. The smbtorture test uses a trace taken from a NetBench run and replays the trace for the number of clients specified. Each client uses a separate area on the server, but their file system areas are all created in the same directory, eg \\server\public\CLIENTS\CLIENT0, \\server\public\CLIENTS\CLIENT1, and so on. The test involves reading and writing large files using large reads (up to 65535 bytes). The dbench test takes the NetBench trace and applies all the file IOs that it would cause without causing any network traffic or involving any protocol handling. Thus it only tests file system performance. It creates its working directories in the same way as smbtorture/NetBench. The tbench test tests out the performance of the networking system between the two machines. It sends exactly the same amount of data that a NetBench test would, but does no file system activity, nor any protocol handling, etc. I set about to measure the relative ability of FreeBSD 4.3-RELEASE and Linux 2.4.13-ac4 (on a RedHat 7.2 system) to handle the workload presented by 10, 20, 30, 40, and 50 NetBench clients. To understand better the limits of each operating system, I also ran dbench and tbench against both FreeBSD 4.3-RELEASE and Linux 2.4.13. The results show that both operating systems can provide similar levels of performance, so performance is not metric that can be used to choose between them. An interesting result is that Linux seems to be better at driving a 100Mb/s link, as well as providing higher file system throughput, but tuning FreeBSD might improve its performance. The rest of this report covers the methods that I used to setup and run the tests, the results I obtained, some obesrvations I made while running the tests and provide some conclusions. METHOD FreeBSD 4.3-RELEASE was loaded onto a 30GB IBM disk drive, while RedHat 7.2 was loaded onto a 20GB Western Digital drive. The 2.4.7 kernel on the RedHat system was replaced with Linux 2.4.13ac4, and the EXT3 file system was used, while the FreeBSD system used the standard file system with soft updates enabled. The drives were an IBM DTLA 307030 UDMA33 for FreeBSD and a WDC WD200EM UDMA100 for Linux. These drives were then booted on a Duron-750MHz based system with a PC Chips motherboard (SIS 730 chipset) with 1GB of memory. The system had two Ethernet controllers, both 100Mb/s. The controller used for the test was a 3C905B. A recent CVS version of Samba (2.2.3pre) was built on each system using the same, default, options, and then installed. Similar smb.conf files were built for each. The config files are shown in the appendix. The NetBench tests were all run from the one driver system, a dual-Celeron 533MHz Abit BP6 with 384MB of memory. They were all performed across a single 100Mb/s link. The dbench tests were run directly on the test system under FreeBSD and Linux. The dbench source and make were