from:"Richard Sharpe"

Re: Is it possible to block pending queued RealTime signals (AIO originating)?

2013-01-09 Thread Richard Sharpe

On Tue, 2013-01-08 at 09:20 -0800, Adrian Chadd wrote:
 On 8 January 2013 08:15, Richard Sharpe rsha...@richardsharpe.com wrote:
  On Tue, 2013-01-08 at 07:36 -0800, Adrian Chadd wrote:
  .. or you could abstract it out a bit and use freebsd's
  aio_waitcomplete() or kqueue aio notification.
 
  It'll then behave much saner.
 
  Yes, going forward that is what I want to do ... this would work nicely
  with a kqueue back-end for Samba's tevent subsystem, and if someone has
  not already written such a back end, I will have to do so, I guess.
 
 Embrace FreeBSD's nice asynchronous APIs for doing things! You know you want 
 to!
 
 (Then, convert parts of samba over to use grand central dispatch... :-)
 
 Seriously though - I was doing network/disk IO using real time signals
 what, 10 + years ago on Linux and it plain sucked. AIO + kqueue +
 waitcomplete is just brilliant. kqueue for signal delivery is also
 just brilliant. Just saying.

The problem with a fully event-driven approach is that it will not work,
it seems to me. Eventually, you find something that is not async and
then you have to go threaded. (Because handling multiple clients in one
process is very useful and you do not want client-A's long-running op
preventing client-B's short-running op from being serviced.)

Then, you run into problems like Posix's insistence that all threads in
a process must use the same credentials (ie, uid and gids must be the
same across all threads), although there is a hack on Linux to work
around this behind glibc's back.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is it possible to block pending queued RealTime signals (AIO originating)?

2013-01-09 Thread Richard Sharpe

On Wed, 2013-01-09 at 10:06 +0800, David Xu wrote:
 [...]
  This code won't work, as I said, after the signal handler returned,
  kernel will copy the signal mask contained in ucontext into kernel
  space, and use it in feature signal delivering.
 
  The code should be modified as following:
 
  void handler(int signum, siginfo_t *info, ucontext_t *uap)
  {
  ...
 
if (count + 1 == TEVENT_SA_INFO_QUEUE_COUNT) {
sigaddset(uap-uc_sigmask, signum);
 
  Hmmm, this seems unlikely because the signal handler is operating in
  user mode and has no access to kernel-mode variables.
 
  Well, it turns out that your suggestion was correct.
 
  I did some more searching and found another similar suggestion, so I
  gave it a whirl, and it works.
 
  Now, my problem is that Jeremy Allison thinks that it is a fugly hack.
  This means that I will probably have big problems getting a patch for
  this into Samba.
 
  I guess a couple of questions I have now are:
 
  1. Is this the same for all versions of FreeBSD since Posix RT Signals
  were introduced?
 
 
 I have checked source code, and found from FreeBSD 7.0, RT signal is
 supported, and aio code uses signal queue.
 
  2. Which (interpretation of which) combination of standards require such
  an approach?
 
 
 The way I introduced is standard:
 http://pubs.opengroup.org/onlinepubs/007904975/functions/sigaction.html
 
 I quoted some text here:
 
 When a signal is caught by a signal-catching function installed by 
 sigaction(), a new signal mask is calculated and installed for the 
 duration of the signal-catching function (or until a call to either 
 sigprocmask() or sigsuspend() is made). This mask is formed by taking 
 the union of the current signal mask and the value of the sa_mask for 
 the signal being delivered [XSI] [Option Start]  unless SA_NODEFER or 
 SA_RESETHAND is set, [Option End] and then including the signal being 
 delivered. If and when the user's signal handler returns normally, the 
 original signal mask is restored.
 
 ...
 
 When the signal handler returns, the receiving thread resumes execution 
 at the point it was interrupted unless the signal handler makes other 
 arrangements. If longjmp() or _longjmp() is used to leave the signal 
 handler, then the signal mask must be explicitly restored.
 
 This volume of IEEE Std 1003.1-2001 defines the third argument of a 
 signal handling function when SA_SIGINFO is set as a void * instead of a 
 ucontext_t *, but without requiring type checking. New applications 
 should explicitly cast the third argument of the signal handling
 
 function to ucontext_t *.
 ^
 
 ---
 
 The above means third parameter is pointing to ucontext_t which is used
 to restored the previously interrupted context, the context contains
 a signal mask which is also restored.
 http://pubs.opengroup.org/onlinepubs/007904975/basedefs/ucontext.h.html

OK, thank you for that. Jeremy agrees that this is a portable approach,
at least across Linux, FreeBSD and Solaris. We will try to get a fix
into Samba to do it the correct way.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is it possible to block pending queued RealTime signals (AIO originating)?

2013-01-08 Thread Richard Sharpe

On Tue, 2013-01-08 at 15:02 +0800, David Xu wrote:
 On 2013/01/08 14:33, Richard Sharpe wrote:
  On Tue, 2013-01-08 at 10:46 +0800, David Xu wrote:
  On 2013/01/08 09:27, Richard Sharpe wrote:
  Hi folks,
 
  I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0
  and I want to check if the assumptions made by the original coder are
  correct.
 
  Essentially, the code queues a number of AIO requests (up to 100) and
  specifies an RT signal to be sent upon completion with siginfo_t.
 
  These are placed into an array.
 
  The code assumes that when handling one of these signals, if it has
  already received N such siginfo_t structures, it can BLOCK further
  instances of the signal while these structures are drained by the main
  code in Samba.
 
  However, my debugging suggests that if a bunch of signals have already
  been queued, you cannot block those undelivered but already queued
  signals.
 
  I am certain that they are all being delivered to the main thread and
  that they keep coming despite the code trying to stop them at 64 (they
  get all the way up to the 100 that were queued.)
 
  Can someone confirm whether I have this correct or not?
 
 
  I am curious that how the code BLOCKs the signal in its signal handler ?
  AFAIK, after signal handler returned, original signal mask is restored,
  and re-enables the signal delivering, unless you change it in
  ucontext.uc_sigmask.
 
  It does try to block the signals in the signal handler using the
  following code (in the signal handler):
 
  if (count+1 == TEVENT_SA_INFO_QUEUE_COUNT) {
  /* we've filled the info array - block this signal until
 these ones are delivered */
  sigset_t set;
  sigemptyset(set);
  sigaddset(set, signum);
  sigprocmask(SIG_BLOCK, set, NULL);
 
  However, I also added pthread_sigmask with the same parameters to see if
  that made any difference and it seemed not to.
 
 
 This code won't work, as I said, after the signal handler returned,
 kernel will copy the signal mask contained in ucontext into kernel
 space, and use it in feature signal delivering.
 
 The code should be modified as following:
 
 void handler(int signum, siginfo_t *info, ucontext_t *uap)
 {
 ...
 
   if (count + 1 == TEVENT_SA_INFO_QUEUE_COUNT) {
   sigaddset(uap-uc_sigmask, signum);

Hmmm, this seems unlikely because the signal handler is operating in
user mode and has no access to kernel-mode variables.

I guess I will just have to read the code.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is it possible to block pending queued RealTime signals (AIO originating)?

2013-01-08 Thread Richard Sharpe

On Tue, 2013-01-08 at 07:36 -0800, Adrian Chadd wrote:
 .. or you could abstract it out a bit and use freebsd's
 aio_waitcomplete() or kqueue aio notification.
 
 It'll then behave much saner.

Yes, going forward that is what I want to do ... this would work nicely
with a kqueue back-end for Samba's tevent subsystem, and if someone has
not already written such a back end, I will have to do so, I guess.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is it possible to block pending queued RealTime signals (AIO originating)?

2013-01-08 Thread Richard Sharpe

On Tue, 2013-01-08 at 08:14 -0800, Richard Sharpe wrote:
 On Tue, 2013-01-08 at 15:02 +0800, David Xu wrote:
  On 2013/01/08 14:33, Richard Sharpe wrote:
   On Tue, 2013-01-08 at 10:46 +0800, David Xu wrote:
   On 2013/01/08 09:27, Richard Sharpe wrote:
   Hi folks,
  
   I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0
   and I want to check if the assumptions made by the original coder are
   correct.
  
   Essentially, the code queues a number of AIO requests (up to 100) and
   specifies an RT signal to be sent upon completion with siginfo_t.
  
   These are placed into an array.
  
   The code assumes that when handling one of these signals, if it has
   already received N such siginfo_t structures, it can BLOCK further
   instances of the signal while these structures are drained by the main
   code in Samba.
  
   However, my debugging suggests that if a bunch of signals have already
   been queued, you cannot block those undelivered but already queued
   signals.
  
   I am certain that they are all being delivered to the main thread and
   that they keep coming despite the code trying to stop them at 64 (they
   get all the way up to the 100 that were queued.)
  
   Can someone confirm whether I have this correct or not?
  
  
   I am curious that how the code BLOCKs the signal in its signal handler ?
   AFAIK, after signal handler returned, original signal mask is restored,
   and re-enables the signal delivering, unless you change it in
   ucontext.uc_sigmask.
  
   It does try to block the signals in the signal handler using the
   following code (in the signal handler):
  
 if (count+1 == TEVENT_SA_INFO_QUEUE_COUNT) {
 /* we've filled the info array - block this signal until
these ones are delivered */
 sigset_t set;
 sigemptyset(set);
 sigaddset(set, signum);
 sigprocmask(SIG_BLOCK, set, NULL);
  
   However, I also added pthread_sigmask with the same parameters to see if
   that made any difference and it seemed not to.
  
  
  This code won't work, as I said, after the signal handler returned,
  kernel will copy the signal mask contained in ucontext into kernel
  space, and use it in feature signal delivering.
  
  The code should be modified as following:
  
  void handler(int signum, siginfo_t *info, ucontext_t *uap)
  {
  ...
  
  if (count + 1 == TEVENT_SA_INFO_QUEUE_COUNT) {
  sigaddset(uap-uc_sigmask, signum);
 
 Hmmm, this seems unlikely because the signal handler is operating in
 user mode and has no access to kernel-mode variables.

Well, it turns out that your suggestion was correct. 

I did some more searching and found another similar suggestion, so I
gave it a whirl, and it works.

Now, my problem is that Jeremy Allison thinks that it is a fugly hack.
This means that I will probably have big problems getting a patch for
this into Samba.

I guess a couple of questions I have now are:

1. Is this the same for all versions of FreeBSD since Posix RT Signals
were introduced?

2. Which (interpretation of which) combination of standards require such
an approach?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is it possible to block pending queued RealTime signals (AIO originating)?

2013-01-08 Thread Richard Sharpe

On Tue, 2013-01-08 at 22:24 -0500, Daniel Eischen wrote:
 On Tue, 8 Jan 2013, Daniel Eischen wrote:
 
  On Tue, 8 Jan 2013, Richard Sharpe wrote:
 
  [ ... ]
  
  Well, it turns out that your suggestion was correct.
  
  I did some more searching and found another similar suggestion, so I
  gave it a whirl, and it works.
  
  Now, my problem is that Jeremy Allison thinks that it is a fugly hack.
  This means that I will probably have big problems getting a patch for
  this into Samba.
 
  I don't understand why JA thinks this is a hack.  Their current
  method doesn't work, or at least isn't portable.  I've tried this
  on Solaris 10, and it works just as it does in FreeBSD.  Test
  program included after signature.
 
   $ ./test_sigprocmask
   Sending signal 16
   Got signal 16, blocked: true
   Blocking signal 16 using method 0
   Handled signal 16, blocked: false
 
   Sending signal 16
   Got signal 16, blocked: true
   Blocking signal 16 using method 1
   Handled signal 16, blocked: true
 
 Weird - I just tested it on Linux (2.6.18-238.el5) and it works
 the same as FreeBSD and Solaris.  Am I misunderstanding something?
 Is it possible that Samba's code is broken on all platforms?

It is possible :-) 

AIO is off by default in configure. Then, when you switch it on in
configure you have to switch it on in the smb.conf.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Is it possible to block pending queued RealTime signals (AIO originating)?

2013-01-07 Thread Richard Sharpe

Hi folks,

I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0
and I want to check if the assumptions made by the original coder are
correct.

Essentially, the code queues a number of AIO requests (up to 100) and
specifies an RT signal to be sent upon completion with siginfo_t.

These are placed into an array.

The code assumes that when handling one of these signals, if it has
already received N such siginfo_t structures, it can BLOCK further
instances of the signal while these structures are drained by the main
code in Samba.

However, my debugging suggests that if a bunch of signals have already
been queued, you cannot block those undelivered but already queued
signals.

I am certain that they are all being delivered to the main thread and
that they keep coming despite the code trying to stop them at 64 (they
get all the way up to the 100 that were queued.)

Can someone confirm whether I have this correct or not?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is it possible to block pending queued RealTime signals (AIO originating)?

2013-01-07 Thread Richard Sharpe

On Mon, 2013-01-07 at 22:24 -0500, Daniel Eischen wrote:
 On Mon, 7 Jan 2013, Richard Sharpe wrote:
 
  Hi folks,
 
  I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0
  and I want to check if the assumptions made by the original coder are
  correct.
 
  Essentially, the code queues a number of AIO requests (up to 100) and
  specifies an RT signal to be sent upon completion with siginfo_t.
 
  These are placed into an array.
 
  The code assumes that when handling one of these signals, if it has
  already received N such siginfo_t structures, it can BLOCK further
  instances of the signal while these structures are drained by the main
  code in Samba.
 
  However, my debugging suggests that if a bunch of signals have already
  been queued, you cannot block those undelivered but already queued
  signals.
 
  I am certain that they are all being delivered to the main thread and
  that they keep coming despite the code trying to stop them at 64 (they
  get all the way up to the 100 that were queued.)
 
  Can someone confirm whether I have this correct or not?
 
 If true, could they not use sigwaitinfo() from a separate
 thread instead and just bypass having to use a signal
 handler altogether?  That thread can either call sigwaitinfo()
 when it is ready to receive more signals, or block on a
 semaphore/CV/whatever while events are being processed.

So, I guess that what I want is something that will continue to work for
both Linux and FreeBSD with minimal code divergence ... 

I guess I need to write a simpler program to check what the deal is.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Is it possible to block pending queued RealTime signals (AIO originating)?

2013-01-07 Thread Richard Sharpe

On Tue, 2013-01-08 at 10:46 +0800, David Xu wrote:
 On 2013/01/08 09:27, Richard Sharpe wrote:
  Hi folks,
 
  I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0
  and I want to check if the assumptions made by the original coder are
  correct.
 
  Essentially, the code queues a number of AIO requests (up to 100) and
  specifies an RT signal to be sent upon completion with siginfo_t.
 
  These are placed into an array.
 
  The code assumes that when handling one of these signals, if it has
  already received N such siginfo_t structures, it can BLOCK further
  instances of the signal while these structures are drained by the main
  code in Samba.
 
  However, my debugging suggests that if a bunch of signals have already
  been queued, you cannot block those undelivered but already queued
  signals.
 
  I am certain that they are all being delivered to the main thread and
  that they keep coming despite the code trying to stop them at 64 (they
  get all the way up to the 100 that were queued.)
 
  Can someone confirm whether I have this correct or not?
 
 
 I am curious that how the code BLOCKs the signal in its signal handler ?
 AFAIK, after signal handler returned, original signal mask is restored,
 and re-enables the signal delivering, unless you change it in
 ucontext.uc_sigmask.

It does try to block the signals in the signal handler using the
following code (in the signal handler):

if (count+1 == TEVENT_SA_INFO_QUEUE_COUNT) {
/* we've filled the info array - block this signal until
   these ones are delivered */
sigset_t set;
sigemptyset(set);
sigaddset(set, signum);
sigprocmask(SIG_BLOCK, set, NULL);

However, I also added pthread_sigmask with the same parameters to see if
that made any difference and it seemed not to.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Possible obscure socket leak when system under load and listener is slow to accept

2012-12-09 Thread Richard Sharpe

On Sun, 2012-12-09 at 00:10 -0800, Alfred Perlstein wrote:
 On 12/8/12 5:05 PM, Richard Sharpe wrote:
  On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote:
  Hi folks,
 
  Our QA group (at xxx) using Samba and smbtorture has been seeing a
  lot of cases where accept returns ECONNABORTED because the system load
  is high and Samba has a large listen backlog.
 
  Every now and then we get a crash in smbd or in winbindd and winbindd
  complains of too many open files in the system.
 
  In looking at kern_accept, it seems to me that FreeBSD can leak a socket
  when kern_accept calls soaccept on it but gets ECONNABORTED. This error
  is the only error returned from tcp_usr_accept.
 
  It seems like the socket taken off so_comp is never freed in this case
  and that there has been a call on soref on it as well, so that something
  like the following is needed in the error path:
 
   //some-path/freebsd/sys/kern/uipc_syscalls.c#1
  - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c 
  @@ -433,6 +433,14 @@
*/
   if (name)
   *namelen = 0;
  +   /*
  +* We need to close the socket we unlinked
  +* so we do not leak it.
  +*/
  +   ACCEPT_LOCK();
  +   SOCK_LOCK(so);
  +   soclose(so);
   goto noconnection;
   }
   if (sa == NULL) {
 
  I think an soclose is needed at this point because soisconnected has
  been called on the socket.
 
  Do you think this analysis is reasonable?

  We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
  maybe I am wrong since I am not sure if the fdclose call would free the
  socket, but a quick look suggested that it doesn't.
  The fdclose should properly tear down the file descriptor.  The call
  graph is: fdclose() - fdrop() - _fdrop() - fo_close()/soo_close() -
  soclose() - sorele() - sofree() - sodealloc().
 
  A socket leak would not count against kern.maxfiles unless the file
  descriptor leaks as well.  So it is unlikely that this is the problem.
  OK, thanks for the feedback. I will keep looking.
 
  Samba may open a large number of files (real files and sockets) and
  you may run into the maxfiles limit.  You can check the limit with
  sysctl kern.maxfiles and increase it at boot time in boot/loader.conf
  with kern.maxfiles=10 for example.
  Well, some of the smbds are dying, but it is possible that there is a
  file leak in Samba or our VFS that we are tripping as well.
 
 lsof and sockstat can be helpful.  lsof may be able to help determine if 
 there's a leak because it MAY will find sockets not associated with a 
 process.
 
 Hope this helps.

Thanks Alfred. After following through the call graph and confirming
(with the code) that it was correct, I am now pretty convinced that I
was wrong in assuming that it was a socket leak.

However, lsof will be useful in allowing me to see how many FDs each
smdb in this test is using. We have, I am told, kern.maxfiles set to
65536, which I think might be a little low for the test they are
running. 


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Possible obscure socket leak when system under load and listener is slow to accept

2012-12-08 Thread Richard Sharpe

Hi folks,

Our QA group (at xxx) using Samba and smbtorture has been seeing a
lot of cases where accept returns ECONNABORTED because the system load
is high and Samba has a large listen backlog.

Every now and then we get a crash in smbd or in winbindd and winbindd
complains of too many open files in the system.

In looking at kern_accept, it seems to me that FreeBSD can leak a socket
when kern_accept calls soaccept on it but gets ECONNABORTED. This error
is the only error returned from tcp_usr_accept.

It seems like the socket taken off so_comp is never freed in this case
and that there has been a call on soref on it as well, so that something
like the following is needed in the error path:

 //some-path/freebsd/sys/kern/uipc_syscalls.c#1
- /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c 
@@ -433,6 +433,14 @@
 */
if (name)
*namelen = 0;
+   /*
+* We need to close the socket we unlinked
+* so we do not leak it.
+*/
+   ACCEPT_LOCK();
+   SOCK_LOCK(so);
+   soclose(so);
goto noconnection;
}
if (sa == NULL) {

I think an soclose is needed at this point because soisconnected has
been called on the socket.

Do you think this analysis is reasonable?

We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
maybe I am wrong since I am not sure if the fdclose call would free the
socket, but a quick look suggested that it doesn't.

 I would appreciate your feedback.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Possible obscure socket leak when system under load and listener is slow to accept

2012-12-08 Thread Richard Sharpe

On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote:
  Hi folks,
 
  Our QA group (at xxx) using Samba and smbtorture has been seeing a
  lot of cases where accept returns ECONNABORTED because the system load
  is high and Samba has a large listen backlog.
 
  Every now and then we get a crash in smbd or in winbindd and winbindd
  complains of too many open files in the system.
 
  In looking at kern_accept, it seems to me that FreeBSD can leak a socket
  when kern_accept calls soaccept on it but gets ECONNABORTED. This error
  is the only error returned from tcp_usr_accept.
 
  It seems like the socket taken off so_comp is never freed in this case
  and that there has been a call on soref on it as well, so that something
  like the following is needed in the error path:
 
   //some-path/freebsd/sys/kern/uipc_syscalls.c#1
  - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c 
  @@ -433,6 +433,14 @@
   */
  if (name)
  *namelen = 0;
  +   /*
  +* We need to close the socket we unlinked
  +* so we do not leak it.
  +*/
  +   ACCEPT_LOCK();
  +   SOCK_LOCK(so);
  +   soclose(so);
  goto noconnection;
  }
  if (sa == NULL) {
 
  I think an soclose is needed at this point because soisconnected has
  been called on the socket.
 
  Do you think this analysis is reasonable?
  
  We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
  maybe I am wrong since I am not sure if the fdclose call would free the
  socket, but a quick look suggested that it doesn't.
 
 The fdclose should properly tear down the file descriptor.  The call
 graph is: fdclose() - fdrop() - _fdrop() - fo_close()/soo_close() -
 soclose() - sorele() - sofree() - sodealloc().
 
 A socket leak would not count against kern.maxfiles unless the file
 descriptor leaks as well.  So it is unlikely that this is the problem.

OK, thanks for the feedback. I will keep looking.

 Samba may open a large number of files (real files and sockets) and
 you may run into the maxfiles limit.  You can check the limit with
 sysctl kern.maxfiles and increase it at boot time in boot/loader.conf
 with kern.maxfiles=10 for example.

Well, some of the smbds are dying, but it is possible that there is a
file leak in Samba or our VFS that we are tripping as well.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Possible problems with mmap/munmap on FreeBSD ...

2005-03-30 Thread Richard Sharpe

On Tue, 29 Mar 2005, David Schultz wrote:

 On Tue, Mar 29, 2005, Richard Sharpe wrote:
  Hi,
 
  I am having some problems with the tdb package on FreeBSD 4.6.2 and 4.10.
 
  One of the things the above package does is:
 
 mmap the tdb file to a region of memory
 store stuff in the region (memmov etc).
 when it needs to extend the size of the region {
   munmap the region
   write data at the end of the file
   mmap the region again with a larger size
 }
 
  What I am seeing is that after the munmap the data written to the region
  is gone.
 
  However, if I insert an msync before the munmap, everything is nicely
  coherent. This seems odd (in the sense that it works without the msync
  under Linux).
 
  The region is mmapped with:
 
 mmap(NULL, tdb-map_size,
  PROT_READ|(tdb-read_only? 0:PROT_WRITE),
  MAP_SHARED|MAP_FILE, tdb-fd, 0);

 It looks like all of the underlying pages are getting invalidated
 in vm_object_page_remove().  This is clearly the right thing to do
 for private mappings, but it seems wrong for shared mappings.
 Perhaps Alan has some insight.

OK, a simple test program that:

  writes some content C1 to a file
  mmaps file to S1
  writes content C2 to S1
  munmaps S1
  mmaps S1
  compares shows expected behavior
  writes content C1 to S1
  munmaps S1
  mmaps S1
  compares shows expected behavior

So, now to do things like extend the file after mmapping etc to see where
the problem lies.

Regards
-
Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org,
sharpe[at]ethereal.com, http://www.richardsharpe.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Possible problems with mmap/munmap on FreeBSD ...

2005-03-29 Thread Richard Sharpe

Hi,

I am having some problems with the tdb package on FreeBSD 4.6.2 and 4.10.

One of the things the above package does is:

   mmap the tdb file to a region of memory
   store stuff in the region (memmov etc).
   when it needs to extend the size of the region {
 munmap the region
 write data at the end of the file
 mmap the region again with a larger size
   }

What I am seeing is that after the munmap the data written to the region
is gone.

However, if I insert an msync before the munmap, everything is nicely
coherent. This seems odd (in the sense that it works without the msync
under Linux).

The region is mmapped with:

   mmap(NULL, tdb-map_size,
PROT_READ|(tdb-read_only? 0:PROT_WRITE),
MAP_SHARED|MAP_FILE, tdb-fd, 0);

What I notice is that all the calls to mmap return the same address.

A careful reading of the man pages for mmap and munmap does not suggest
that I am doing anything wrong.

Is it possible that FreeBSD is deferring flushing the dirty data, and then
forgets to do it when the same starting address is used etc?

Regards
-
Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org,
sharpe[at]ethereal.com, http://www.richardsharpe.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Error in my C programming

2005-02-20 Thread Richard Sharpe

On Mon, 21 Feb 2005, Kathy Quinlan wrote:

 Peter Jeremy wrote:

  On Mon, 2005-Feb-21 00:22:56 +0800, Kathy Quinlan wrote:
 
 These are some of the errors I get in pairs for each of the above variables:
 
 Wtrend_Drivers.c:15: conflicting types for `Receiver'
 Wtrend_Drivers.h:9: previous declaration of `Receiver'
 
 
  Without knowing exactly what is on those lines, it's difficult to offer
  any concrete suggestions.
 
  Two possible ways forward:
  1) Change the declaration at Wtrend_Drivers.h:9 to be 'extern'
  2) Pre-process the source and have a close look at the definitions and
 declarations for Receiver.  You may have a stray #define that is
 confusing the type or a missing semicolon.
 
  Peter
 
 Here is a section of my code:

 *** Wtrend_Drivers.c ***

 (12)void Reset_Network (unsigned char Network)
 (13)   {
 (14)Length = 0x00;
 (15)Receiver = 0x00;
 (16)Node = 0xFF;
 (17)Command = Reset;
 (18)Make_Packet_Send(Head , Length, Network, Receiver, Node,
 Command, p_Data);
 (19)   }

 *** Wtrend_Drivers.h ***

 unsigned char Length , Network , Receiver , Node , Command = 0x00;

 The above is line 9 of the Wtrend_Drivers.h
 The numbers in () I have added to show the line numbers in Wtrend_Drivers.c

 These are some of the errors I get in pairs for each of the above variables:

 Wtrend_Drivers.c:15: conflicting types for `Receiver'
 Wtrend_Drivers.h:9: previous declaration of `Receiver'

Ummm, move the definition of all those variables to before their first
use and see what that does. Also, check that you do not have an earlier
definition that does not include the extern keyword.

Regards
-
Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org,
sharpe[at]ethereal.com, http://www.richardsharpe.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Error in my C programming

2005-02-20 Thread Richard Sharpe

On Sun, 20 Feb 2005, Michael C. Shultz wrote:

   Here is a section of my code:
  
   *** Wtrend_Drivers.c ***
  
   (12)void Reset_Network (unsigned char Network)
   (13)   {
   (14)Length = 0x00;
   (15)Receiver = 0x00;
   (16)Node = 0xFF;
   (17)Command = Reset;
   (18)Make_Packet_Send(Head , Length, Network, Receiver, Node,
   Command, p_Data);
   (19)   }
  
   *** Wtrend_Drivers.h ***
  
   unsigned char Length , Network , Receiver , Node , Command = 0x00;
  
   The above is line 9 of the Wtrend_Drivers.h
   The numbers in () I have added to show the line numbers in
   Wtrend_Drivers.c
  
   These are some of the errors I get in pairs for each of the above
   variables:
  
   Wtrend_Drivers.c:15: conflicting types for `Receiver'
   Wtrend_Drivers.h:9: previous declaration of `Receiver'
  
   I would try putting the variables in the header file on separate
   lines. For example:
  
   unsigned char Length = 0;
   unsigned char Network = 0;
   unsigned char Receiver = 0;
   etc.
 
  Done that to no avail :(
 
  Regards,
 
  Kat.

 I wonder if Receiver is defined in a include file elsewhere? I checked
 all the header files on my system and it isn't, perhaps it is on your
 though? Maybe easier to rename it?

However, the error messages point out that the conflicting definition is
where Receiver is first used in the function in the .c file. If it was
another definition, we would be told of the actual .h file where the
definition came from. I have seen that lots of times :-)

Regards
-
Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org,
sharpe[at]ethereal.com, http://www.richardsharpe.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Error in my C programming

2005-02-20 Thread Richard Sharpe

On Sun, 20 Feb 2005, Michael C. Shultz wrote:

 *** Wtrend_Drivers.c ***

 (12)void Reset_Network (unsigned char Network)
 (13)   {
 (14)Length = 0x00;
 (15)Receiver = 0x00;
 (16)Node = 0xFF;
 (17)Command = Reset;
 (18)Make_Packet_Send(Head , Length, Network, Receiver,
 Node, Command, p_Data);
 (19)   }

 *** Wtrend_Drivers.h ***

 unsigned char Length , Network , Receiver , Node , Command =
 0x00;

 The above is line 9 of the Wtrend_Drivers.h
 The numbers in () I have added to show the line numbers in
 Wtrend_Drivers.c

 These are some of the errors I get in pairs for each of the
 above variables:

 Wtrend_Drivers.c:15: conflicting types for `Receiver'
 Wtrend_Drivers.h:9: previous declaration of `Receiver'

[Deletia ..]

   I wonder if Receiver is defined in a include file elsewhere? I
   checked all the header files on my system and it isn't, perhaps it
   is on your though? Maybe easier to rename it?
 
  However, the error messages point out that the conflicting definition
  is where Receiver is first used in the function in the .c file. If it
  was another definition, we would be told of the actual .h file where
  the definition came from. I have seen that lots of times :-)
 
  Regards

 Your right.  We do not have enough of her code.  I tried this:

 #include stdio.h
 unsigned char Receiver = 0;

 int   main(void)
 {
   Receiver = 0x00;
   printf( Receiver -=%c\n, Receiver  );
   return(0);
 }

 compiled it with:

 gcc -W -Wall -ansi -pedantic -Wbad-function-cast -Wcast-align \
 -Wcast-qual -Wchar-subscripts -Winline \
 -Wmissing-prototypes -Wnested-externs -Wpointer-arith \
   -Wredundant-decls -Wshadow -Wstrict-prototypes zz.c -o zz

 and no warnings

In private correspondence with the person asking the question it was
indicated that initially GCC was used with no flags.

Regards
-
Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org,
sharpe[at]ethereal.com, http://www.richardsharpe.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Throughput problems with NFS between Linux and FreeBSD

2003-09-19 Thread Richard Sharpe

Hi,

We recently encountered a problem with NFS throughput between a FreeBSD 
server (we are using 4.6.2, but the same code seems to be in 5.1 as well).

When using Linux 2.4.19 or 2.4.21 as a client, although this might extend 
to other clients, and copying a large file, you will see the behavior 
shown in 
http://www.richardsharpe.com/ethereal-stuff.html#Time%20Sequence%20Graphs

This happens because Linux hangs onto the ack for the last segment of a 
32kB+header send for a while. The FreeBSD NFS server will not put anymore 
data in the socket because of an soreserve with a size of 32kB+header, so 
it waits for about 39mS until Linux finally sends the ack for the last 
segment. (Unless there is data, like another command, going the other way, 
that is).

Throughput is about 3MB/s on GigE.

The problem seems to be the following code

if (so-so_type == SOCK_STREAM)
siz = NFS_MAXPACKET + sizeof (u_long);
else
siz = NFS_MAXPACKET;
error = soreserve(so, siz, siz);

in src/sys/nfs/nfs_syscalls.c.

We added a sysctl to allow finer control over what is passed to soreserve.

With the fix in, it goes up to around wire speed when lots of data is in 
the cache.

This was found by Chandu Gadhiraju with help from others.

Regards
-
Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, 
sharpe[at]ethereal.com, http://www.richardsharpe.com

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Throughput problems with NFS between Linux and FreeBSD

2003-09-19 Thread Richard Sharpe

On Fri, 19 Sep 2003, John-Mark Gurney wrote:

 Richard Sharpe wrote this message on Fri, Sep 19, 2003 at 10:38 -0700:
  We recently encountered a problem with NFS throughput between a FreeBSD 
  server (we are using 4.6.2, but the same code seems to be in 5.1 as well).
  
  When using Linux 2.4.19 or 2.4.21 as a client, although this might extend 
  to other clients, and copying a large file, you will see the behavior 
  shown in 
 
 [...]
 
  The problem seems to be the following code
  
  if (so-so_type == SOCK_STREAM)
  siz = NFS_MAXPACKET + sizeof (u_long);
  else
  siz = NFS_MAXPACKET;
  error = soreserve(so, siz, siz);
  
  in src/sys/nfs/nfs_syscalls.c.
  
  We added a sysctl to allow finer control over what is passed to soreserve.
  
  With the fix in, it goes up to around wire speed when lots of data is in 
  the cache.
 
 What is the fix?  You don't say what adjustments to soreserve's parameters
 are necessary to improve performance?  Have you done testing against other
 clients to see how your changes will affect performance on those machines?

The beest fix is:

 if (so-so_type == SOCK_STREAM)
-siz = NFS_MAXPACKET + sizeof (u_long);
+siz = NFS_MAXPACKET + sizeof (u_long) + MSS;
 else
 siz = NFS_MAXPACKET;
 error = soreserve(so, siz, siz);

in src/sys/nfs/nfs_syscalls.c.

Since the client should only hang onto the ack for one segment, and that 
will work even if you have end-to-end jumbo frames. A simpler fix might be 
to replace MSS with 2048.

Regards
-
Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, 
sharpe[at]ethereal.com, http://www.richardsharpe.com

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: FreeBSD on Intel Server Board SE7501WV2

2003-09-11 Thread Richard Sharpe

On Thu, 11 Sep 2003, Yaoping Ruan wrote:

 Hello,
 
 We plan to install FreeBSD on an Intel Server Board SE7501WV2 with 1 XEON
 2.4GHz CPU, 4GB PC2100 DDR memory, and a Seagate 120GB ATA 7200RPM
 harddisk, and use the server box for high demand SpecWeb99 tests. Does
 anyone have any experience on this Server Board, and see any compatibility
 problem here? The reason I am asking is that we had some kernel
 compatibility issues on other Intel Board.

Is that the dual-processor-capable borad? If so, I have one of those at 
home, and it works with FreeBSD 4.7 or so. It certainly has the 7501 
chipset in it and 2x1.8GHz Xeons.
 
 An other question about this box is the two on-board Gigabit Network
 Controller. Do they work fine on FreeBSD? May I use fiber Giganet NCI like
 Netgear GA621 on it?

They worked for me. I dunno about the fibre stuff.

Regards
-
Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, 
sharpe[at]ethereal.com, http://www.richardsharpe.com

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Looking for FreeBSD kernel debugging help

2003-06-11 Thread Richard Sharpe

On Wed, 11 Jun 2003, Nat Lanza wrote:

 On Wed, 2003-06-11 at 06:22, Terry Lambert wrote:
  Someone should port the network debugging from Darwin using
  the tiny IP stack from NetBSD.
 
 Well, there's this:
 
 http://ipgdb.sourceforge.net/
 
  IPGDB is a collection of extensions to GDB and FreeBSD-4.3
  to allow two-machine kernel debugging over UDP. It behaves
  much like two-machine kernel debugging over serial ports. 
  
  These extensions can easily be applied to other releases of
  FreeBSD. With a little bit of modification, these extension
  can be applied to other BSD variants.
 
 It hasn't been updated in a while, but it's definitely a start. It works
 pretty well for 4.3, and I know it's been updated to work with 4.6
 (though possibly not in the sourceforge distribution).

I think that Groggy was working on this a while back.

Regards
-
Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, 
sharpe[at]ethereal.com, http://www.richardsharpe.com

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

What prevents there from being a race with open(...,O_CREAT |O_EXCL,...)

2003-01-10 Thread Richard Sharpe

Hi,

In looking at vn_open, I see that it calls namei and then a little while 
later calls VOP_CREATE.

If the user did open(..., ... O_CREAT | O_EXCL, ...), what prevents a race 
where one process discovers that the name doesn't already exist but 
another gets in and creates the name?

Regards
-
Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, 
sharpe[at]ethereal.com, http://www.richardsharpe.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: pw(8): $ (dollar sign) in username

2002-12-27 Thread Richard Sharpe

On Fri, 27 Dec 2002, Ryan Thompson wrote:

 
 Hi all,
 
 I've recently had the pleasure of configuring a FreeBSD machine as a
 Samba Primary Domain Controller. In smb.conf, one can specify an add
 user script directive to automate the creation of machine accounts.
 Otherwise, you have to manually create accounts for each machine on
 the network. See:
 
   http://us1.samba.org/samba/ftp/docs/htmldocs/Samba-PDC-HOWTO.html
 
 Problem is, smb requires a '$' at the end of the username, which our
 pw(8) doesn't allow.

Well, the $ is only required for machine accounts! In Samba 3.0, and 
possibly in the latest 2.2.x releases, there is a separate 'add machine 
script' parameters. I think it would be better to simply frob the entry in 
the master.passwd in that script. 

While I have not tried it myself, I am lead to believe that once you edit 
the entry and run the appropriate command, things work.
 
 Allowing the $ is a one-character change to usr.sbin/pw/pw_user.c .
 Aside from the obvious pain of accidentally inserting shell variables
 as part of a username if the $ is not escaped, are there any specific
 problems with this change?
 
 Others would probably benefit from this. Is the change worth
 committing? Or would it be better to push this to pw.conf?
 
 --- usr.sbin/pw/pw_user.c.orig  Sat Nov 16 21:55:28 2002
 +++ usr.sbin/pw/pw_user.c   Fri Dec 27 11:17:33 2002
 @@ -1195,7 +1195,7 @@
  pw_checkname(u_char *name, int gecos)
  {
 int l = 0;
 -   char const *notch = gecos ? :!@ : ,\t:+#%$^()!@~*?=|\\/\;
 +   char const *notch = gecos ? :!@ : ,\t:+#%^()!@~*?=|\\/\;
 
 while (name[l]) {
 if (strchr(notch, name[l]) != NULL || name[l]  ' ' || name[l] == 
127 ||
 
 - Ryan
 
 

-- 
Regards
-
Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, 
sharpe[at]ethereal.com, http://www.richardsharpe.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: nsswitch help for you?

2002-12-25 Thread Richard Sharpe

On Wed, 25 Dec 2002 [EMAIL PROTECTED] wrote:

 Quoting Danny Braniss [EMAIL PROTECTED]:
 
  what exactly do you want/need?
  danny
 
 Sorry, I'll try to get as specific as I can with my currently limited knowledge 
 of the FBSD source code.
 
 Basically, I would like to know where I can find information on the nsswitch 
 protocol (if that is even such a thing): perhaps a document or standards paper?
 
Well, here is what I think is needed:

  The ability to load shared libraries like /lib/libnss_ldap.so or 
/lib/libnss_winbind.so from within the *pw* and the *gr* calls, under 
control of /etc/nsswitch.conf.

My suggestion would be to look at how Linux or Slowaris do this. 

My understanding is that the current nsswitch on FreeBSD is hard-coded as 
to the places it can look in, so winbindd cannot be used as a backend, 
nore can ldap, etc.

I spent some time looking for the interface spec on the web, but could not 
find one, although you can infer what it looks like by looking at the 
winbind_nss.c code in Samba. I imagine that libc is going to need to be 
modified a bit to handle this, but I am only guessing. 

Regards
-
Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, 
sharpe[at]ethereal.com, http://www.richardsharpe.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: [hackers] Re: Netgraph could be a router also.

2002-11-11 Thread Richard Sharpe

On Mon, 11 Nov 2002, David Gilbert wrote:

  Terry == Terry Lambert [EMAIL PROTECTED] writes:
 
 Terry By it, I guess you mean FreeBSD?
 
 Terry What are your performance goals?
 
 Right now, I'd like to see 500 to 600 kpps.
 
 Terry Where is FreeBSD relative to those goals, right now, without
 Terry you doing anything to it?
 
 Without any work, we got 75 kpps.
 
 Terry Where is FreeBSD relative to those goals, right now, if you
 Terry tune it very carefully, but don't hack any code?
 
 With a few patches, including polling and some tuning, we got 150 to
 200 kpps.
 
 Note that we've been focusing on pps, not Mbs.  With 100M cards (what
 we're currently using) we want to focus on getting the routing speed
 up.
 
 One of the largest problems we've found with GigE adapters on FreeBSD
 is that their pps ability (never mind the volume of data) is less than
 half that of the fxp driver.

This is intriguing. I have found with Samba that I am able to achieve 
approx 100MB/s read from cache with 1500B frame sizes (ie, no jumbo 
frames) over a BCM5701 on an 850 MHz PIII with FreeBSD 4.3 and similar 
rates from em0 on a 2GHz P4 with 4.6. Both results were with 1500B frames 
and considerable free CPU (50% on the 850MHz PIII).

However, given that they were full 1500B frames (99%), at least in one 
direction, perhaps that does not count.

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED], http://www.richardsharpe.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: max phy mem known working with FreeBSD 4.x

2002-11-05 Thread Richard Sharpe

On Tue, 5 Nov 2002, Terry Lambert wrote:

 Matt wrote:
Anyone knows the max physical mem that can be used  with FreeBSD4.3?
 [ ... ]
  any chance going more than 4G?
 
 Sure, if you want to install it to warm things up.
 
 No, if you want to access it; access is limited to 4G, because
 that's 32 bits of address space, and your machine is a 32 bit
 machine.

Well, the P4 does have an address extension that allows addressing of up 
to 64GB, EPA or something like that.

However, it requires some work. Would be nice in large Samba servers, 
though, to be able to cache enormous amounts of file data :-)

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED], http://www.richardsharpe.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: max phy mem known working with FreeBSD 4.x

2002-11-04 Thread Richard Sharpe

On Mon, 4 Nov 2002, Matt wrote:

 Anyone knows the max physical mem that can be used  with FreeBSD4.3?

Well, we have 4GB in a 4.6.2 system, and I think that we ran 4.3 on those 
systems for a while.

However, you lose anywhere between 128M and 512M because of the PCI 
address space.

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED], http://www.richardsharpe.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Disk reliability (was: Tagged Command Queuing or Larger Cache?)

2002-10-29 Thread Richard Sharpe

On Wed, 30 Oct 2002, Greg 'groggy' Lehey wrote:

 On Tuesday, 29 October 2002 at  2:03:50 +, Daniel O'Connor wrote:
  On Tue, 2002-10-29 at 01:54, Kenneth Culver wrote:
  I haven't had any trouble with the WDxxxBB drives - the WDxxxAA drives
  are pretty unreliable though.
 
  Hrmm, I havn't tried those, but just about every WD drive I've used has
  ended up with problems which were of course handled by the warranty, but
  even then, I still had to reinstall the os and pull a bunch of stuff from
  my backups which was a pain to do for each failure. Like I said, just my
  personal experience. I don't think the new 8MB cache drives have been out
  long enough to actually develop the problems I've seen on WD drives
  though.
 
  Yes, but my point is that the AA drives are bad, but the BB drives seem
  good. I have been using them for a while (~1 year) without trouble.
 
 I've had trouble with BB drives.  Given that they have (or had) a 3
 year warranty, 1 year of experience isn't very much to go by.
 
  Personally I find that no HD manufacturer has a good reputation -
  they have all made trashy drives at one point. Give the general time
  it takes for problems to surface vs product lifetimes makes deciding
  what to buy a PITA :(
 
 That's a more valid point.
 
 Note that WD and Seagate have dropped their warranty on IDE drives
 from 3 years to 1 year.  What does this say to you?

Hmmm, from what I remember, they did that for the 5400RPM drives, not the 
7200RPM drives!

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED], http://www.richardsharpe.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: *nixopt, an idea

2002-08-29 Thread Richard Sharpe


On Thu, 29 Aug 2002, Koul-Henning Pamp wrote:

 Fellow hackers,
 
 I'm creating a new unix optimizer tool, I've finished version 0.1 and need feedback.

At the risk of feeding the feeble Troll :-)

You would think that if this person has that much time on their hands, 
they would actually go find a project that needs developers and do 
something productive, like, say, Linux, or ... (list of about 1,000 
projects elided).

Them that can, do. Them that can't, talk about it.

Sigh.

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Name service switch

2002-07-25 Thread Richard Sharpe


On Wed, 24 Jul 2002, Terry Lambert wrote:

 Richard Sharpe wrote:
  Hmmm, so what you are telling me is that winbindd will not work on FreeBSD
  even under 5.0?
 
 What part of FreeBSD 5.x implements NSS, but does not implement
 IRS was unclear?  8-).
 
 The winbindd program is an NSS interfaced program; therefore it
 should work *fine* in 5.x, since NSS supposedly works fine in 5.x.

Well, you were saying that DSOs were unsafe or some such, so I assumed 
that FreeBSD 5.x's NSS did not support DSOs, and thus windindd.

I guess I should just try it.
 
 -- Terry
 

-- 
Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Name service switch

2002-07-24 Thread Richard Sharpe


On Wed, 24 Jul 2002, Terry Lambert wrote:

 Paul Khavkine wrote:
  Well the one we have in -CURRENT lacks dynamic module support (as does IRS).
  
  I just wanted to know if there was any issues for not implementing IRS
  before ?
 
 The BIND IRS implementation depends on use of the BIND resolver
 library.  In FreeBSD, the resolver library is integrated into
 libc, so upgrading it is very, very hard compared to what it
 would be if it were boken out into a seperate libresolv.
 
 The use of loadable modules has two problems; the first is that
 it requires that binaries be dynamically, not statically linked,
 because FreeBSD does not support a static libdlopen because of
 how symbol lookups are wedged for things like a NULL parameter,
 and importing of main object symbols by loaded modules (in fact,
 the ELF standard was never intended to support static linking),
 and some programs can not be dynamically linked (anything run
 before /usr is mounted to get lib and libexec).  The second is
 that dynamic linking and modules themselves open you up to
 security exploits based on inherent flaws in the idea in a
 hostile implementation environment.

Hmmm, so what you are telling me is that winbindd will not work on FreeBSD 
even under 5.0?
 
 If you want to find an IRS patch set for FreeBSD, serach for
 the terms irs nss ldap, and it will be in the top 5 or
 so.
 
 -- Terry
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message
 

-- 
Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: tuning for samba

2002-07-14 Thread Richard Sharpe


On Sun, 14 Jul 2002, Doug Barton wrote:

 Chad David wrote:
  
  So, I'm building a new box tonight and was wondering if anybody
  has any tried and true tuning parameters for samba on -stable. 
 
 Since you never got any actual answers to your question, I offer the
 following. The only samba tuning option I've ever seen make a difference
 is enabling socket options = TCP_NODELAY. Also, make sure that newreno

   Default: socket options = TCP_NODELAY

 is turned off on the samba host. As for nfs mount options, I have found
 that -3cisl works best for me, when the servers are sun, or netapp
 boxes. 

How does turning off newreno help? We think we are seeing fast retransmit 
get confused in the presence of dropped packets. Is this possibly related?

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: tuning for samba

2002-07-11 Thread Richard Sharpe


On Wed, 10 Jul 2002, Darren Pilgrim wrote:

 Chad David wrote:
  
  A local company has been having issues with samba for some time (it kills
  an e250, and has seriously stressed an e5000) and I've been telling the
  admin (half seriously) that he should just toss it on a PC with FreeBSD.
  Well they finally got tired of hearing FreeBSD this and FreeBSD that and
  asked me to bring a box in if I was so confident... tomorrow morning at
  9am.  So, I'm building a new box tonight and was wondering if anybody
  has any tried and true tuning parameters for samba on -stable.  They
  currently have ~700 users attached.  The load per user is pretty low
  but just rebooting and handling the reconnects has killed small boxes.
  
  As a side note, the data being served will be attached to the samba server
  via NFS.
 
 The one thing I've seen kill a box besides the reboot-reconnect blast
 is content searches by the Windows Find dialog.  All it takes is one
 user on a fast machine and network link doing the Windows equivalent
 of find / -name * -exec grep foo \{\} \; to run you out of file
 descriptors in a matter of seconds.

Yes, Samba has to do readdir scans to simulate a case-insensitive file 
system on a case-sensitive file system.
 
 Samba uses a seperate process for each connection, and Windows opens
 one connection per share.

Yes to the first claim, no to the second. Most definitely not. For a 
single client, windows puts all share access (net use, mounting, whatever 
you want to call it) over the single TCP connection to the server.

The only time Windows will create a new connection is if you have given 
the server multiple NetBIOS names, and you use different NetBIOS names to 
access the share. For example, even if the NetBIOS names NB1 and NB2 
translate to the same IP (10.10.10.10), if you do the following:

  net use f: \\nb1\share1
  net use f: \\nb2\share1

the client will establish two different connections. However, that is the 
only way I know to get multiple connections from a client to a server. 
Even Terminal Server multiplexes multiple users over the one TCP 
connection.

 Most Windows users only work on one share
 at a time, so with two open shares on ~700 machines that means ~1400
 connections with roughly half of them idle.  That's a lot of freeable
 RAM should you suddenly need it.

Nope, ~700 connections!

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: tuning for samba

2002-07-11 Thread Richard Sharpe


On Thu, 11 Jul 2002, Darren Pilgrim wrote:

 Richard Sharpe wrote:
  On Wed, 10 Jul 2002, Darren Pilgrim wrote:
   Samba uses a seperate process for each connection, and Windows opens
   one connection per share.
  
  Yes to the first claim, no to the second. Most definitely not. For a
  single client, windows puts all share access (net use, mounting, whatever
  you want to call it) over the single TCP connection to the server.
 
 You're right, sorry.  I had gotten mixed up on the multiple connection
 issue because of my own configuration that results in one share per
 connection.
 
  Nope, ~700 connections!
 
 Even with just one connection per machine, though, you're still going
 to have a significant amount of swappable memory in idle smbd
 processes.

Yes, I agree. Something that I would like to do more about by making sure 
that as much as possible is shared.

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: tuning for samba

2002-07-10 Thread Richard Sharpe


On Wed, 10 Jul 2002, Chad David wrote:

 A local company has been having issues with samba for some time (it kills
 an e250, and has seriously stressed an e5000) and I've been telling the
 admin (half seriously) that he should just toss it on a PC with FreeBSD.
 Well they finally got tired of hearing FreeBSD this and FreeBSD that and
 asked me to bring a box in if I was so confident... tomorrow morning at
 9am.  So, I'm building a new box tonight and was wondering if anybody
 has any tried and true tuning parameters for samba on -stable.  They
 currently have ~700 users attached.  The load per user is pretty low
 but just rebooting and handling the reconnects has killed small boxes.

As others have said, memory is an issue.

In some 'benchmark' testing, I have noticed that FreeBSD holds up pretty 
well to large numbers of connects coming in at one time, say compared to 
Linux. Starting up 100 clients during about two or three seconds (as long 
as it takes to fork 100 processes on the driver) does not kill a FreeBSD 
Samba server as much as it does a Linux server running Linux 2.4.x.

Certainly, a 2GB machine that I regularly test against does not notice the 
smbds start up all that much.

 As a side note, the data being served will be attached to the samba server
 via NFS.

Hmmm, some of the locking stuff might be an issue then ...
 
Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Memory layout tools ... to tweak Samba's memory footprint

2002-07-10 Thread Richard Sharpe


Hi,

I am looking for tools that will help me find data-structures in Samba 
that should be in shared pages but aren't because they are not marked 
appropriately by the source code.

Are there any tools that will allow me to break out the layout of an 
executable file, but also that will allow me to profile memory usage? 

From the sound of the various discussions I have heard about FreeBSD's VM, 
since Samba forks a copy of smbd for each connection, any pages that have 
not been written by children smbd's will be shared in anycase, so maybe 
there is not so much to worry about?
 
Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: tuning for samba

2002-07-10 Thread Richard Sharpe


On Wed, 10 Jul 2002, Chad David wrote:

 On Thu, Jul 11, 2002 at 11:20:51AM +0930, Richard Sharpe wrote:
  
  Certainly, a 2GB machine that I regularly test against does not notice the 
  smbds start up all that much.
 
 I have no real way of testing this type of load here, but first thing tomorrow
 morning I'll know..

Up on samba.org in CVS under cifs-load-gen is a tool that can simulate 
clients. Simulating the startup of 100's of clients and then watching what 
happens to the server is not too hard, as long as you have a driver that 
can withstand the load of that many driver processes starting :-)
 
  
   As a side note, the data being served will be attached to the samba server
   via NFS.
  
  Hmmm, some of the locking stuff might be an issue then ...
 
 This is my biggest concern.  I just don't know what to tune here since
 the data just basically passes straight through the box, and the with
 about of data being served and the access patterns buffering is pointless.
 
 One thing I failed to mention, none of the clients ever write; the system
 is completely read only.
 
 Thanks.
 
 

-- 
Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Adding readdir entries to the name cache ...

2002-07-05 Thread Richard Sharpe


On Thu, 4 Jul 2002, Terry Lambert wrote:

[major snippage, much useful to think about here ...]

  However, ls seems to call lstat in the same order that the files are in
  the directory, and a normal clock approach to directories would yield
  exactly the same result. Further, in the cases that the user did not want
  a -l, we would avoid adding many potentially useless names to the name
  cache and reducing its performance.
 
 This is because the sort occurs first.  An unsorted ls (which is
 available -- see the man page) doesn't have this issue.

I don't want to start a flame war, but a truss of ls -l shows the 
following:

getdirentries(0x5,0x809d000,0x1000,0x80990b4)= 4096 (0x1000)
lstat(.gnome,0x809c248)= 0 (0x0)
lstat(.mc,0x809c348)   = 0 (0x0)
lstat(.xinitrc,0x809c44c)  = 0 (0x0)
lstat(750B.pdf,0x809c54c)  = 0 (0x0)
lstat(Mail,0x809c648)  = 0 (0x0)
lstat(nsmail,0x809c748)= 0 (0x0)
lstat(.cshrc,0x809c848)= 0 (0x0)
lstat(.ssh,0x809c948)  = 0 (0x0)
lstat(.gnome_private,0x809ca50)= 0 (0x0)
lstat(.xchat,0x809cb48)= 0 (0x0)
lstat(.exmh,0x809cc48) = 0 (0x0)
lstat(.ICEauthority,0x809cd50) = 0 (0x0)
lstat(.netrc,0x809ce48)= 0 (0x0)

This is the same order that 'ls -fal' produced.

This suggests that the ls is doing an unsorted lookup of the info, and 
then sorting. That is the way I would have done it as well.
 
Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Adding readdir entries to the name cache ...

2002-07-05 Thread Richard Sharpe


On Thu, 4 Jul 2002, Terry Lambert wrote:

 Richard Sharpe wrote:
 
 Note that you can get another 8-12% by making negative cache entries,
 since DOS/Windows clients tend to search the full path each time,
 even though they have already received a successful response in
 the past (i.e. there is no client caching for this information,
 because there is no distributed coherency protocol that would permit
 server invalidation of the cache).  Unmodified, SVR4 DNLC can not
 support negative cache entries (there need to be two line changes).

Hmmm, I think that the major part of the problem there was that, for what 
ever reason, Barry Feigenbaum of IBM, declined to add a Change Working 
Directory or Set Working Diretory command to the SMB protocol.

Thus, at least for the SMB protocol, and maybe generally, Windows clients 
must always send the full pathname for every file they want, unless it 
happens to be at the root of the share.

Perhaps I am wrong about that.

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Case independent file name searched (was Re: Adding readdir entriesto the name cache ...)

2002-07-05 Thread Richard Sharpe


On Thu, 4 Jul 2002, Terry Lambert wrote:

 Richard Sharpe wrote:
  [1] Samba, because it has to support the Windows case insensitive file
  system, must do some pretty ugly things :-) When a client asks that a file
  be opened, for example, Samba tries with exactly the case that was
  presented. If that fails, it must do a readdir scan of the directory so it
  can do a case-insensitive match. So, even negative caching does not buy us
  much in the the case of Samba. What would help is a case insensitive
  filesystem.
 
 It is useful to be able to do case sensitive on storage, case insensitive
 on lookup on a per process basis.  The easiest is if you wire this in
 as a flag on the proc itself.  The normal way this is done is a flag to
 sfork, but... it should also be possible to have the proc open itself
 in procfs, and then ioctl() down a flag setting for this.

I have come to the conslusion that Terry is right.

Having watched a cygwin-based build of a package, the behavio[u]r is just 
too ugly. When an include file is looked for, it causes Samba to do 
readdir scans for every directory in the -I chain that the include file is 
not in until it is found. If we could eliminate all those readdir scans 
performance would improve dramatically.

Fundamentally, what I want to support is both UNIX clients (say, via NFS 
etc) and Windows clients to be able to share files in the same directory.

Samba already does case-preserving file name creation, and indeed, the 
problem does not go away even if Samba always case-folds all names to 
lower case, because a UNIX-user or -client might still create two files 
that differ only by the case of one or more characters in their names.

This means that Terry is right when he says I need an IOCTL. Basically, 
normal users get the normal case sensitive file system, while Windows 
clients, via an IOCTL which says, give ME case-independed lookups, get a 
slightly different file system.

To support that, however, I need to change the name cache hash function to 
be case-insensitive (there's more--see below). This means that name cache 
hash chains could get longer. In the worst case, if a file system contains 
large numbers of files with long names, all using the same characters that 
only differ by case of indivual characters, the hash chain becomes a 
linear search. However, UNIX file systems generally don't get like that. I 
imagine that the hash chains will grow to no more that twice their current 
size, but will probably grow by a factor close to one.

Another problem is the extra complexity required in cache_lookup. When we 
want cache-insensitive lookups, we have to do extra work, even if we find 
a match in the cache. The problem is with files that differ by only the 
case of one or more characters. When this occurs, my view is that we 
should return the file with the longest string of exactly matching 
characters, however, we might allow the sys admin to set policy, at the 
expense of complicating things.

When we search a hash chain, if we get an exact match, we are done, but if 
we don't get an exact match, we still have to do a readdir scan to find a 
better match, and to ensure that we return consistent results. Similarly,
when we do a readdir scan, if we get an exact match, we are done, but if 
we don't, we need to keep going.

Another aspect that needs consideration is the effect on negative caching. 
Getting a negative result on the exact name match is no good anylonger, 
since there may be a case-insensitive match in the directory. This seems 
to make negative cache entries useless for case-insensitive matching.

Finally, I think that persuing this subject some more is very important 
from the point of view of constructing high-performance CIFS servers, 
based on Samba or other software, so I would appreciate comments.

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Adding readdir entries to the name cache ...

2002-07-05 Thread Richard Sharpe


On Fri, 5 Jul 2002, Garance A Drosihn wrote:

 At 6:29 AM +0930 7/6/02, Richard Sharpe wrote:
 Hmmm, I think that the major part of the problem there was that,
 for what ever reason, Barry Feigenbaum of IBM, declined to add
 a Change Working Directory or Set Working Diretory command to
 the SMB protocol.
 
 Thus, at least for the SMB protocol, and maybe generally, Windows
 clients  must always send the full pathname for every file they
 want, unless it happens to be at the root of the share.
 
 Could the unix process for samba fake that?  Keep track of the
 most recently used directory, and when a new request comes in
 split it into directory plus filename, and if the directory
 is the same as the previous one, then just access the filename.
 If the directory is different, try to do a chdir() to the new
 directory, and if that succeeds then save that as the previous
 directory.

Yes it can do that, and should do that. I will have to check what Samba 
does. I know I proposed adding a path cache to smbclient/smbtar so that it 
could avoid repeatedly, and even a cache of one path could make a big 
difference.
 
 Or is that more trouble than it's worth?

No, I think it is worth a lot. I suspect Samba already does that. There is 
just so much code to look at. 

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: broadcast packets not reaching sender on if_hwassist capableinterfaces

2002-05-31 Thread Richard Sharpe


On Fri, 31 May 2002, mark tinguely wrote:

 
 Summary: FreeBSD 4.3 Broadcast IP datagrams not looping on Broadcom GigE
 card.
 
   Also, why is the decision to loop the packet back made in ether_output 
   rather than in ip_output? Off the top of my head I can't see any 
   particular advantage, but perhaps there is. The disadvantage is that I 
   will have to disable hardware assist for broadcast packets to make things 
   work right.
 
 the bge sets the interface output to be ether_output() which is
 called from the ip_output() after all thse flags have been set..

Two things I forgot to mention:

1. The driver we are using is a Broadcom propietary driver, not the BGE 
driver.

2. Our driver is mistakenly setting the device as being in SIMPLEX mode 
:-)

And, I realized in the shower this morning, if you have to calculate the 
checksums on-board for any reason for that packet, there is no point in 
having the checksums calculated twice.
 
 Are these packets that fail part of a IP fragment? m_copy does not copy
 the hwassist flags. we see the same problem with multicast packets.
 Bill Fenner wrote a simular multicast patch before FreeBSD 4.5-RELEASE,
 but even that fix has not been included into the tree yet either.
 In ip_output, we should either we need to copy the hwassist flag in 
 _copy or in the code that follows a m_copy, or force the checksum
 calculations.

Yes. We are investigating fixing our driver to make sure that it sets 
DUPLEX mode since our switch is capable of it.

 if your packets are not part of a fragment, then let me know.

Nope, but thanks for your reply.

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

broadcast packets not reaching sender on if_hwassist capable interfaces

2002-05-30 Thread Richard Sharpe


Hi,

I was chasing an interesting problem in our FreeBSD 4.3 codebase today.

Broadcast IP datagrams were not being received by programs on the same 
system the IP datagrams were sent from. They were making it out the wire.

The were being sent on a Broadcom GigE interface that has hardware 
checksumming. 

After a while we realized that what was happening was that the packets 
were being marked for hardware checksumming in ip_output 
(m-m_pkthdr.csum_flags), but the decision about looping a packet back to 
ourselves is made in ether_output, so the packet is never checksummed and 
is discarded in ip_input.

Has this been fixed in a subsequent release?

Also, why is the decision to loop the packet back made in ether_output 
rather than in ip_output? Off the top of my head I can't see any 
particular advantage, but perhaps there is. The disadvantage is that I 
will have to disable hardware assist for broadcast packets to make things 
work right.
 
Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: File locking, closes and performance in a distributed filesystemenv

2002-05-15 Thread Richard Sharpe


On Tue, 14 May 2002, Terry Lambert wrote:

 Richard Sharpe wrote:
  Hmmm, I wasn't very clear ...
  
  What I am proposing is a 'simple' fix that simply changes
  
  p-p_flag |= P_ADVLOCK;
  
  to
  
  fp-l_flag |= P_ADVLOCK;
  
  And never resets it, and then in closef,
  
  if ((fp-l_flag  P_ADVLOCK)  fp-f_type == DTYPE_VNODE) {
  lf.l_whence = SEEK_SET;
  lf.l_start = 0;
  lf.l_len = 0;
  lf.l_type = F_UNLCK;
  vp = (struct vnode *)fp-f_data;
  (void) VOP_ADVLOCK(vp, (caddr_t)p-p_leader, F_UNLCK, lf,
  F_POSIX);
  }
  
  Which still means that the correct functionality is implemented, but we
  only try to unlock files that have ever been locked before, or where we
  are sharing a file struct with another (related) process and one of them
  has locked the file.
 
 Do you expect to share the same fp between multiple open 
 instances for a given file within a single process?
 
 I think your approach will fail to implement proper POSIX
 file locking semantics.
 
 I really hate POSIX semantics, but you have to implement
 them exactly (at least by default), because programs are
 written to expect them.
 
 Basically, this means that if you open a file twice, lock it
 via the first fd, then close the second fd, all locks are
 released.  In your code, it looks like what would happen is
 that when you closed the second fd, the fp-l_flag won't have
 the bit set.  Correct me if I'm wrong?
 
 The reason for the extra overhead now is that you can't do
 this on an open instance basis because of POSIX, so it does it
 on a process instance basis.

OK, you have convinced me. I have looked at the POSIX spec in this area, 
and agree that I can't do what I want to do.

 The only other alternative is to do it on a vp basis -- and
 since multiple fp's can point to the same vp, your option #2
 will fail, as described above, but my suggestion to do the
 locking locally will associate it the the vp (or the v_data,
 depending on which version of FreeBSD, and where the VOP_ADVLOCK
 hangs the lock list off of: the vnode or the inode) will
 maintain the proper semantics.
 
 Your intent isn't really to avoid the VOP_ADVLOCK call, it's
 to avoid making an RPC call to satisfy the VOP_ADVLOCK call,
 right?

Yes, correct. We will have to do it in the vnode layer as you suggest. 
Currently we are using 4.3 and moving to 4.5, so we will have to figure 
out the differences.

 You can't really avoid *all* the avoidable overhead, without
 restructuring the VOP_ADVLOCK interface, which is politically
 difficult.

I wouldn't want to try. Too much code to change and too much chance of a 
massive screw-up.

Thanks for perservering with me.

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

File locking, closes and performance in a distributed file systemenv

2002-05-14 Thread Richard Sharpe


Hi,

I might be way off base here, but we have run into what looks like a 
performance issue with locking and file closes.

We have implemented a distributed file system and were looking at some 
performace issues.

At the moment, if a process locks a file, the code in kern_descrip.c that 
handles it does the following:

 p-p_flag |= P_ADVLOCK;

to indicate that files might be locked.

Later in closef, we see the following:

   if (p  (p-p_flag  P_ADVLOCK)  fp-f_type == DTYPE_VNODE) {
lf.l_whence = SEEK_SET;
lf.l_start = 0;
lf.l_len = 0;
lf.l_type = F_UNLCK;
vp = (struct vnode *)fp-f_data;
(void) VOP_ADVLOCK(vp, (caddr_t)p-p_leader, F_UNLCK, lf, 
F_POSIX);
}

This seems to mean that once a process locks a file, every close after 
that will pay the penalty of calling the underlying vnode unlock call. In 
a distributed file system, with a simple implementation, that could be an 
RPC to the lock manager to implement.

Now, there seems to be a few ways to migitate this:

1. Keep (more) state at the vnode layer that allows us to not issue a 
network traversing unlock if the file was not locked. This means that any 
process that has opened the file will have to issue the network traversing 
unlock request once the flag is set on the vnode.

2. Place a flag in the struct file structure that keeps the state of any 
locks on the file. This means that any processes that share the struct 
(those related by fork) will need to issue unlock requests if one of them 
locks the file.

3. Change a file descriptor table that hangs off the process structure so 
that it includes state about whether or not this process has locked the 
file.

It seems that each of these reduces the performance penalty that processes 
that might be sharing the file, but which have not locked the file, might 
have to pay.

Option 2 looks easy. 

Are there any comments? 
 
Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: File locking, closes and performance in a distributed filesystemenv

2002-05-14 Thread Richard Sharpe


On Tue, 14 May 2002, Terry Lambert wrote:

Hmmm, I wasn't very clear ...

What I am proposing is a 'simple' fix that simply changes 

p-p_flag |= P_ADVLOCK;

to

fp-l_flag |= P_ADVLOCK;

And never resets it, and then in closef,

if ((fp-l_flag  P_ADVLOCK)  fp-f_type == DTYPE_VNODE) {
lf.l_whence = SEEK_SET;
lf.l_start = 0;
lf.l_len = 0;
lf.l_type = F_UNLCK;
vp = (struct vnode *)fp-f_data;
(void) VOP_ADVLOCK(vp, (caddr_t)p-p_leader, F_UNLCK, lf, 
F_POSIX);
}

Which still means that the correct functionality is implemented, but we 
only try to unlock files that have ever been locked before, or where we 
are sharing a file struct with another (related) process and one of them 
has locked the file.


 Richard Sharpe wrote:
  I might be way off base here, but we have run into what looks like a
  performance issue with locking and file closes.
 
 [ ... ]
 
  This seems to mean that once a process locks a file, every close after
  that will pay the penalty of calling the underlying vnode unlock call. In
  a distributed file system, with a simple implementation, that could be an
  RPC to the lock manager to implement.
 
 Yes.  This is pretty much required by the POSIX locking
 semantics, which require that the first close remove all
 locks.  Unfortunately, you can't know on a per process
 basis that there are no locks remaining on *any* vnode for
 a given process, so the overhead is sticky.
 
  Now, there seems to be a few ways to migitate this:
  
  1. Keep (more) state at the vnode layer that allows us to not issue a
  network traversing unlock if the file was not locked. This means that any
  process that has opened the file will have to issue the network traversing
  unlock request once the flag is set on the vnode.
  
  2. Place a flag in the struct file structure that keeps the state of any
  locks on the file. This means that any processes that share the struct
  (those related by fork) will need to issue unlock requests if one of them
  locks the file.
  
  3. Change a file descriptor table that hangs off the process structure so
  that it includes state about whether or not this process has locked the
  file.
  
  It seems that each of these reduces the performance penalty that processes
  that might be sharing the file, but which have not locked the file, might
  have to pay.
  
  Option 2 looks easy.
  
  Are there any comments?
 
 #3 is really unreasonable.  It implies non-coelescing.  I know that
 CIFS requires this, and so does NFSv4, so it's not an unreasonable
 thing do do eventually (historical behaviour can be maintained by
 removing all locks in the overlap region on an unlock, yielding
 logical coelescing).  The amount of things that will need to be
 touched by this, though, means it's probably not worth doing now.
 
 In reality, for remote FS's, you want to assert the lock locally
 before transitting the network anyway, in case there is a local
 conflict, in which case you avoid propagating the request over
 the network.  For union mounts of local and remote FS's, for
 which there is a local lock against the local FS by another process
 that doesn't respect the union (a legitimate thing to have happen),
 it's actually a requirement, since the remote system may promote
 or coelesce locks, and that means that there is no reverse process
 for a remote success followed by a local failure.
 
 This is basically a twist on #1:
 
 a)Assert the lock locally before asserting it remotely;
   if the assertion fails, then you have avoided a network
   operation which is doomed to failure (the RPC call you
   are trying to avoid is similar).
 
 b)When unlocking, verify that the lock exists locally
   before attempting to deassert it remotely.  This means
   there there is still the same local overhead as there
   always was, but at least you avoid the RPC in the case
   where there are no outstanding locks that will be
   cleared by the call.
 
 I've actually wanted the VOP_ADVLOCK to be veto-based for going
 on 6 years now, to avoid precisely the type of problems your are
 now facing.  If the upper layer code did local assertion on vnodes,
 and called the lower layer code only in the success cases, then the
 implementation would actually be done for you already.
 
 -- Terry
 

-- 
Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Broadcom BCM5701 Chipset problems

2002-05-13 Thread Richard Sharpe


On Mon, 13 May 2002, David Greenman-Lawrence wrote:

 David Greenman-Lawrence wrote:
  If you aren't using VLAN tagging, you shouldn't care.
  
 No, that is absolutely not correct. The checksum problems happend in many
  situations, depending on the chipset and other factors. The problem that
  resulted in the commit to disable the receive hardware checksum was caused
  by small packets with certain byte patterns, NOT VLAN ENCAPSULATION.
 
 Are you sure you are talking about the Tigon III, and not the Tigon II?
 
Yes, of course. I'm talking specifically about the Broadcom BCM570x. My
 particular experiance was with the Syskonnect 9D21 and 9D41 boards which
 both use the Altima chip.

I have seen checksum problems with the 5700 ...

Can you tell me which steppings of the 5701 you are seeing the problems 
with? Is it with 1500-byte frames, jumbo frames, or both?

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: sendfile() in tftpd?

2002-04-23 Thread Richard Sharpe


On Tue, 23 Apr 2002, Attila Nagy wrote:

 Hello,
 
  No, sendfile() is only for TCP connections, TFTP is using UDP. If you
  want performance, use something else.
 It's even in the manpage:
 Sendfile() sends a regular file specified by descriptor fd out a stream
 socket specified by descriptor s.
 
 Silly me. BTW, I can't use anything else. Are there any alternatives to
 TFTP for booting machines off the network? (using standard, PC components)

Multicast! BootIX (nee InCom) have support for this in their BootROMS. it 
might not be hard to hack into Etherboot et al.

Regards
-
Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Is there anyway from userspace to get the 'enum vtagtype v_tag' for a vnode for an open file?

2002-01-08 Thread Richard Sharpe


Hi,

I want to find out the file system type from userspace for an open file.

Can I get at this info? the stat call does not give it to me.

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba 
in 24 Hours, Special Edition, Using Samba



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Is there anyway from userspace to get the 'enum vtagtype v_tag' for a vnode for an open file?

2002-01-08 Thread Richard Sharpe


Richard Sharpe wrote:

 Hi,

 I want to find out the file system type from userspace for an open file.

 Can I get at this info? the stat call does not give it to me. 

Hmmm, getfsspec seems to fill the need. Sorry for the noise.

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba 
in 24 Hours, Special Edition, Using Samba




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Adding si_fd to struct __siginfo ...

2001-12-21 Thread Richard Sharpe


Hi,

One of my tasks is to add oplock support to FreeBSD so that we (Panasas) 
can allow correct caching of files by Windows clients in the presence of 
NFS clients using the same files.

We have a preliminary implementation, based on the Linux implementation, 
but it is a gross hack because there is no way for the kernel, when it 
delivers a signal, to indicate the fd that caused delivery of the signal.

Linux and Solaris have an fd field in struct siginfo_t which allows the 
kernel to indicate, for signals relating to files, to indicate which fd 
the signal relates to.

I notice that in FreeBSD struct siginfo_t seems to have int 
__spare__[7]; and would like to use one of those spare fields as si_fd.

While I can do that in our code base, if I want to contribute the OpLock 
code it would be useful if the FreeBSD community finds this change 
agreeable.

Are there any counter suggestions or any big objections?

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba 
in 24 Hours, Special Edition, Using Samba



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Adding si_fd to struct __siginfo ...

2001-12-21 Thread Richard Sharpe


Alfred Perlstein wrote:

* Richard Sharpe [EMAIL PROTECTED] [011221 15:11] wrote:

Hi,

One of my tasks is to add oplock support to FreeBSD so that we (Panasas) 
can allow correct caching of files by Windows clients in the presence of 
NFS clients using the same files.

We have a preliminary implementation, based on the Linux implementation, 
but it is a gross hack because there is no way for the kernel, when it 
delivers a signal, to indicate the fd that caused delivery of the signal.

Linux and Solaris have an fd field in struct siginfo_t which allows the 
kernel to indicate, for signals relating to files, to indicate which fd 
the signal relates to.

I notice that in FreeBSD struct siginfo_t seems to have int 
__spare__[7]; and would like to use one of those spare fields as si_fd.

While I can do that in our code base, if I want to contribute the OpLock 
code it would be useful if the FreeBSD community finds this change 
agreeable.

Are there any counter suggestions or any big objections?


There was already a big mess of a discussion about how this would
be much better done via kqueue than with realtime signals.

I guess if you can get a working implementation that is compatible
with the existing interfaces it would work, however it's a _much_
better idea to use kqueue to deliver this sort of notification.

And yes, it has been discussed in the lists already.

OK, I will go and look at the discussion ...

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba 
in 24 Hours, Special Edition, Using Samba




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Adding si_fd to struct __siginfo ...

2001-12-21 Thread Richard Sharpe


Mike Barcroft wrote:

Richard Sharpe [EMAIL PROTECTED] writes:

There was already a big mess of a discussion about how this would
be much better done via kqueue than with realtime signals.

I guess if you can get a working implementation that is compatible
with the existing interfaces it would work, however it's a _much_
better idea to use kqueue to deliver this sort of notification.

Well, it turns out that there are two problems with what I suggested: 1, 
signals are lossy, in that if multiple signals occur, only one might be 
delivered; and 2, there is no place to store any signal-related 
information in the kernel, in any case.

So, it seems like kqueue is really the only game in town, but I will 
need an appropriate filter, and it would be nice if I could get some 
sort of async notification that there were events ready to be processed, 
as I really don't want to rewrite Samba completely, just to support 
kqueue ...

Hmmm, perhaps the approach should be to signal that leases/oplocks have 
been broken, but provide the details via kqueue.


And yes, it has been discussed in the lists already.

OK, I will go and look at the discussion ...


Unfortunately this discussion mistakenly took place on a FreeBSD
mailing list intended for administrative-only issues, so it isn't
publicly available on our end.  Luckily, a Samba mailing list was
on the CC line.  You should be able to find it on the
[EMAIL PROTECTED] archives circa September 2001.

Best regards,
Mike Barcroft

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba 
in 24 Hours, Special Edition, Using Samba




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Does anyone know if the Broadcom BCM5700 has problems with HW csum?

2001-12-15 Thread Richard Sharpe


David Greenman wrote:

I am playing with a driver for the Broadcom 5700/5701.

It recognizes the 5700 in my 3Com cards OK, but seems to screw up the 
TCP checksum.

Switching off hardware checksum capability fixes it.

Does anyone know the details of which stepping this stuff worked on?


   I haven't nailed down the problem that I've seen with them to a specific
chipset, but I can confirm that they incorrectly calculate the checksum on
input packets in some cases. It seems to be related to both packet size and
certain TCP options (or lack of them). I've only seen the problem occur with
very small (0-4 byte payload) packets.
   In any case, after discussing this problem with Bill Paul, I disabled
input checksum in the -current driver and intend to merge that to -stable in
a few days.

OK, that makes sense, because I wasn't getting past first base. SYN ACK 
segments werre being rejected with bad checksum.

The driver I modified is actually for the 5701, which works fine with 
all checksum offloading enabled.

I will try to disable just receive TCP checksum and see what happens.

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba 
in 24 Hours, Special Edition, Using Samba




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Does anyone know if the Broadcom BCM5700 has problems with HW csum?

2001-12-14 Thread Richard Sharpe


Hi,

I am playing with a driver for the Broadcom 5700/5701.

It recognizes the 5700 in my 3Com cards OK, but seems to screw up the 
TCP checksum.

Switching off hardware checksum capability fixes it.

Does anyone know the details of which stepping this stuff worked on?

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba 
in 24 Hours, Special Edition, Using Samba



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

FreeBSD might not have been slower that Linux in the real world ...

2001-12-04 Thread Richard Sharpe


Hi,

As you might have seen, there was a problem in the FreeBSD code that 
Matt Dillon fixed recently.

This problem involved the flag TCP_NODELAY not being propogated across 
an accept() call.

This resulted in tbench runs turning in very poor performance under 
FreeBSD compared with Linux.

However, in the real world, ie in Samba, this might not have been a 
problem at all. I believe, but will not be able to check for a little 
while now, that Samba was doing the setsockopt() call after the accept() 
call, and indeed, after the fork() call when a new smbd is fork'd to 
handle the new connection. Since TCP_NODELAY is the default, Samba under 
FreeBSD was probably always getting the benefit of that 68Mb/s that it 
seems possible to get using the SMB protocol on a 100Mb/s link.

However, it is good that FreeBSD also gets good numbers under the 
benchmarks.

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba
in 24 Hours, Special Edition, Using Samba


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Patch #3 (TCP / Linux / Performance)

2001-12-02 Thread Richard Sharpe


OK Matt, that last patch did the trick.

I am now getting 68 and 69Mb/s between my Linux system and the FreeBSD 
system.

I have also tried the loopback interface, and I am getting 371Mb/s for 1 process, 
dropping to about 320Mb/s for 5.

This seems like it is close to the limit for the machine I am using, as 
CPU hits 100% when I ran the above tbench runs.

I will have to try it with Gigabit Ethernet, but won't be able to do so 
until next week or the week after (after I get to the US).

Does the FreeBSD tcp stack do zero copy (page flip the data to 
userspace)? In the localhost case, it seems like there are two copies 
to/from userspace there.

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba
in 24 Hours, Special Edition, Using Samba


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Samba performance compared between FreeBSD and Linux ...

2001-12-02 Thread Richard Sharpe


Hi,

It seems like all of the issues uncovered have been fixed, so it seems like you cannot 
use performance as a way to choose between FreeBSD and Linux any longer.


I will re-issue my report, but I do not have any more time to spend on this now for 
several days.


I will most likely re-run the tests when I get to the US later this week, and would 
hope to re-issue the report the week after next.


-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba
in 24 Hours, Special Edition, Using Samba


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: Found the problem, w/patch (was Re: FreeBSD performing worse than Linux?)

2001-12-01 Thread Richard Sharpe


Matthew Dillon wrote:

 Index: tcp_output.c
 ===
 RCS file: /home/ncvs/src/sys/netinet/tcp_output.c,v
 retrieving revision 1.39.2.10
 diff -u -r1.39.2.10 tcp_output.c
 --- tcp_output.c  2001/07/07 04:30:38 1.39.2.10
 +++ tcp_output.c  2001/11/30 21:18:10
 @@ -912,7 +912,14 @@
   tp-t_flags = ~TF_ACKNOW;
   if (tcp_delack_enabled)
   callout_stop(tp-tt_delack);
 +#if 0
 + /*
 +  * This completely breaks TCP if newreno is turned on
 +  */
   if (sendalot  (!tcp_do_newreno || --maxburst))
 + goto again;
 +#endif
 + if (sendalot)
   goto again;
   return (0);
  }
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message
 
OK, I have applied this patch, and FreeBSD 4.4-STABLE now seems to behave 
approximately the same as Linux. There are no extra ACKs, and FreeBSD now coalesces 
pairs of ACKs.


However, performance for one client is still at 25Mb/s with the tbench run, while 
Linux provides around 68Mb/s.


So, it is back to staring at traces. Perhaps I will get a full trace now.


-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba
in 24 Hours, Special Edition, Using Samba


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: UDMA33 and SiS5591 on FreeBSD 4.4-RELEASE

2001-12-01 Thread Richard Sharpe


Greg Lehey wrote:

 On Saturday,  1 December 2001 at 13:05:53 +0100, Søren Schmidt wrote:
 
It seems Zwane Mwaikambo wrote:

Hi,
 I've got a box which boots up with UDMA33 but during the boot
sequence gets write problems and ends up disabling it and i presume
falling back to PIO4. I've tested the same box on Linux 2.4.2+ and have
had no problems running it at UDMA33.

Host: SiS 5591 (revision?)
Disk: Seagate 3.2G ATA2

Ohhh, I need alot more info before I can tell whats going on..
I need at least the dmesg from a verbosely booted system and
also a pciconf -l to tell what chips you have.

 
 Note that there are other chips out there which return the same PCI
 information but which appear to be capable of ATA 100.  I recently
 gave a patch to Richard Sharpe (copied) which he says was able to get
 his SiS 5591 to run at ATA 100.  I'm still waiting for feedback from
 him before forwarding it to you.  I also have a machine with a SiS
 5591 which can't go beyond ATA 33.  Here are the pciconf outputs for
 each chip:
 
 Mine (ATA 33):
 
 atapci0@pci0:0:1:   class=0x01018a card=0x chip=0x55131039 rev=0xd0 
hdr=0x00
 
 Richard's (ATA 100):
pci0:0:1 Class=0x010180 card=0x55131039 chip=0x55131039 red=0xd0 
hdr=0x00
 
 Dwayne's:
 
 atapci0@pci0:0:1:   class=0x010180 card=0x55131039 chip=0x55131039 rev=0xd0 
hdr=0x00
 
 I don't understand why Richard's output is missing the atapci@ at the
 beginning.  I believe he was using 4.3-RELEASE at this point; mine was
 from 4-STABLE of May this year.

Here is what pciconf gives me:

atapci0@pci0:0:1: class=0x010180 card=0x55131039 chip=0x55131039 
rev=0xd0 hdr=0x00

Attached is the patch I am using, which is based on what Greg gave me. 
It tries UDMA5 first, and steps down ...

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba
in 24 Hours, Special Edition, Using Samba


--- ata-dma.c.orig  Wed Oct 31 07:29:52 2001
+++ ata-dma.c   Fri Nov 30 14:38:52 2001
@@ -519,30 +519,61 @@
break;
 
 case 0x55131039:   /* SiS 5591 */
-   if (udmamode = 2  pci_get_revid(parent)  0xc1) {
-   error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0,
-   ATA_UDMA2, ATA_C_F_SETXFER, ATA_WAIT_READY);
-   if (bootverbose)
-   ata_printf(scp, device,
-  %s setting UDMA2 on SiS chip\n,
-  (error) ? failed : success);
-   if (!error) {
-   pci_write_config(parent, 0x40 + (devno  1), 0xa301, 2);
-   scp-mode[ATA_DEV(device)] = ATA_UDMA2;
-   return;
+   if (bootverbose)
+   printf (SiS 5513/5591, udmamode %d\n, udmamode);
+   if (pci_get_revid(parent)  0xc1) {
+   udmamode = 5;   /* Force it to 100 */
+   if (udmamode = 5) {/* Claims UDMA 100 */
+   error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0,
+   ATA_UDMA5, ATA_C_F_SETXFER, ATA_WAIT_READY);
+   if (bootverbose)
+   ata_printf(scp, device,
+  %s setting UDMA5 on SiS chip\n,
+  (error) ? failed : success);
+   if (!error) {
+   pci_write_config(parent, 0x40 + (devno  1), 0xa301, 2);
+   scp-mode[ATA_DEV(device)] = ATA_UDMA5;
+   return;
+   }
}
-   }
-   if (wdmamode =2  apiomode = 4) {
-   error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0,
-   ATA_WDMA2, ATA_C_F_SETXFER, ATA_WAIT_READY);
-   if (bootverbose)
-   ata_printf(scp, device,
-  %s setting WDMA2 on SiS chip\n,
-  (error) ? failed : success);
-   if (!error) {
-   pci_write_config(parent, 0x40 + (devno  1), 0x0301, 2);
-   scp-mode[ATA_DEV(device)] = ATA_WDMA2;
-   return;
+   if (udmamode = 4) {/* Claims UDMA 66 */
+   error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0,
+   ATA_UDMA4, ATA_C_F_SETXFER, ATA_WAIT_READY);
+   if (bootverbose)
+   ata_printf(scp, device,
+  %s setting UDMA4 on SiS chip\n,
+  (error) ? failed : success);
+   if (!error) {
+   pci_write_config(parent, 0x40 + (devno  1), 0xa301, 2);
+   scp-mode[ATA_DEV(device)] = ATA_UDMA4;
+   return;
+   }
+   }
+   if (udmamode = 2) {
+   error = ata_command(scp, device, ATA_C_SETFEATURES, 0, 0, 0,
+   ATA_UDMA2, ATA_C_F_SETXFER, ATA_WAIT_READY);
+   if (bootverbose)
+   ata_printf(scp, device

Re: FreeBSD performing worse than Linux?

2001-11-30 Thread Richard Sharpe


Alfred Perlstein wrote:

 * Richard Sharpe [EMAIL PROTECTED] [011130 15:02] wrote:
The traffic in the tbench case is SMB taffic. Request/response, with a

mixture of small requests and responses, and big request/small response 

or small request/big response, where big is 64K.


I have switched off newreno, and it made no difference. I have switched 
off delayed_ack, and it reduced performance about 5 percent. I have made 
sure that SO_SNDBUF and SO_RCVBUF were set to 131072 (which seems to be 
the max), and it increased performance marginally (like about 2%), but 
consistently.

I am still analysing the packet traces I have, but it seems to me that 
the crucial difference is Linux seems to delay longer before sending 
ACKs, and thus sends less ACKs. Since the ACK is piggybacked in the 
response (or the next request), it all works fine, and the 
reponse/request gets there sooner.

However, I have not convinced myself that the saving of 20uS or so per 
request/response pair accounts for some 40+ Mb/s.


 Can you try these two commands:
 
 sysctl -w net.inet.tcp.recvspace=65536
 sysctl -w net.inet.tcp.sendspace=65536


Yes, that is what I did ... 


-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba
in 24 Hours, Special Edition, Using Samba


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: FreeBSD performing worse than Linux?

2001-11-30 Thread Richard Sharpe


Hi,

I think that there are two different problems here. My situation 
involves a LAN (actually, a crossover cable).

I have captured a trace of a 1 client run between the Linux driver and 
the FreeBSD test system as well as between the Linux driver and the same 
test system running Linux.

I am noticing some interesting things. Linux uses the timestamp option 
in all the TCP segments I have looked at, so it is sending 12 more bytes 
per segment that FreeBSD.

However, more interesting is that for small messages (less that 1460), 
FreeBSD does not seem to delay sending ACKs, so we get the following 
pattern:

FREEBSD

Driver - Test system: 94 byte IP DG with simulated command
Test System - Driver: Ack after 83uS
Test System - Driver: Psh Ack after 29uS with 79 total bytes in IP DG

LINUX

Driver - Test system: 106 byte IP DG with simulated command
Test System - Driver: Psh Ack after 89uS with 91 total bytes in IP DG

So, as you can see, Linux seems to shave some time off each transaction 
by avoiding sending extra ACKs.

Also, what I am seeing is that neither FreeBSD nor Linux is doing ACK coalescing (if 
that is possible).


While I understand that coalescing ACKs will mess up RTT calculations and SRTT a bit, 
it would serve to reduce the time taken until responses come back.


What I am seeing for large transmits is the following:


FreeBSD (Test) Linux (Driver)
   Request, 1500 bytes including request and some
data

  More segments from the request

Some ACKs   -
About one every two segments
   Last data segment, usually less that 1500
Lots of ACKs

one per segment

Usually with large window (ie 16020 when the max window seems to
be 16384).
Response
Less than 1500

Now, I have seen something like 10+ ACKS after the driver has finished 
sending. They appear to be one per sent segment. Then the FreeBSD system 
sends its response. The optimal would be for the FreeBSD system to delay 
the ack until it has data to send, which it probably already has.

What I see with the Linux trace is that Linux coalesces ACKs. However, 
the most I have seen it coalesce is two segments.


HTH.

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba
in 24 Hours, Special Edition, Using Samba


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

A comparison of Samba performance on FreeBSD 4.3-RELEASE and Linux 2.4.13ac4

2001-11-29 Thread Richard Sharpe


Hi,

attached is a preliminary report on a comparison of Samba performance on 
FreeBSD 4.3-RELEASE and Linux 2.4.13ac4.

I have posted it because I promised to do so, however, I think you 
should take the numbers with a grain of salt.

It demonstrates that overall, for the client tests I did (including up 
to 100 clients, but not reported), on the same hardware (well pretty 
much), the two operating systems are comparable.

Until I resolve issues around why FreeBSD thinks that my SIS730-equiped 
PCchips 810MLR board has a UDMA33 controller only and Linux thinks it is 
capable of UDMA100, the dbench numbers do not mean much.

In addition, I continue to look at the tbench numbers to see what the 
story is with respect to FreeBSD and TCP performance. Perhaps I have 
done something wrong.

Feedback welcome, but I will have limited time to respond over the next 
week.

-- 
Richard Sharpe, [EMAIL PROTECTED], LPIC-1
www.samba.org, www.ethereal.com, SAMS Teach Yourself Samba
in 24 Hours, Special Edition, Using Samba



Measuring the performance of Samba under FreeBSD and Linux
Richard Sharpe
25-Nov-2001

INTRODUCTION
One of the tools available for measuring the performance of Samba and other 
CIFS servers is NetBench, the ZiffDavis benchmark. This menchmark attempts 
to simulate the workload applied to a CIFS server by one of more CIFS clients.

The Samba team has developed a number of tools that can be used to gain feel
for the ability of various parts of a Samba system (the whole system, the file 
system, or the networking subsystem) to handle the NetBench workload. This 
workload is based on a network trace.

These tools are: smbtorture, which can provide an indication of the ability of
a server to handle the NetBench load from one or more clients; dbench, which 
can provide an indication of the ability of the filesystem on a server to 
handle the workload offered by one or more clients; and tbench, which can 
provide an indication of the ability of the networking subsystem to handle
the workload offered by one or more NetBench clients.

The smbtorture test uses a trace taken from a NetBench run and replays the 
trace for the number of clients specified. Each client uses a separate area on 
the server, but their file system areas are all created in the same directory,
eg \\server\public\CLIENTS\CLIENT0, \\server\public\CLIENTS\CLIENT1, and so on.
The test involves reading and writing large files using large reads (up to 
65535 bytes).

The dbench test takes the NetBench trace and applies all the file IOs that it 
would cause without causing any network traffic or involving any protocol 
handling. Thus it only tests file system performance. It creates its working 
directories in the same way as smbtorture/NetBench.

The tbench test tests out the performance of the networking system between the
two machines. It sends exactly the same amount of data that a NetBench test 
would, but does no file system activity, nor any protocol handling, etc.

I set about to measure the relative ability of FreeBSD 4.3-RELEASE and Linux 
2.4.13-ac4 (on a RedHat 7.2 system) to handle the workload presented by 10, 20,
30, 40, and 50 NetBench clients. To understand better the limits of each operating 
system, I also ran dbench and tbench against both FreeBSD 4.3-RELEASE and Linux 2.4.13.

The results show that both operating systems can provide similar levels of 
performance, so performance is not metric that can be used to choose between 
them. An interesting result is that Linux seems to be better at driving a 
100Mb/s link, as well as providing higher file system throughput, but tuning 
FreeBSD might improve its performance.

The rest of this report covers the methods that I used to setup and run the 
tests, the results I obtained, some obesrvations I made while running the tests
and provide some conclusions.

METHOD

FreeBSD 4.3-RELEASE was loaded onto a 30GB IBM disk drive, while RedHat 7.2
was loaded onto a 20GB Western Digital drive. The 2.4.7 kernel on the RedHat 
system was replaced with Linux 2.4.13ac4, and the EXT3 file system was used, 
while the FreeBSD system used the standard file system with soft updates 
enabled.

The drives were an IBM DTLA 307030 UDMA33 for FreeBSD and a WDC WD200EM
UDMA100 for Linux.

These drives were then booted on a Duron-750MHz based system with a PC Chips 
motherboard (SIS 730 chipset) with 1GB of memory. The system had two Ethernet 
controllers, both 100Mb/s. The controller used for the test was a 3C905B.

A recent CVS version of Samba (2.2.3pre) was built on each system using the 
same, default, options, and then installed. Similar smb.conf files were built 
for each. The config files are shown in the appendix.

The NetBench tests were all run from the one driver system, a dual-Celeron
533MHz Abit BP6 with 384MB of memory. They were all performed across a single
100Mb/s link.

The dbench tests were run directly on the test system under FreeBSD and Linux.
The dbench source and make were

63 matches

Mail list logo