Before you read this, I want to state (for reasons listed below) that I don't expect an answer (advice is welcomed, but please read this email carefully before answering). I'm sharing this with the community with the hope that better software results from our sad experience...
BACKGROUND I've been using NT for 4 years, Netware and Linux for 3 years, and Samba for almost 2. I work in the IT department of a medium-sized unit of a global advertising company. We have a Netware and NT environment with a bit of Linux. We installed a 280GB IDE Samba archive server (rare usage) and a 15GB SCSI Mac/Samba file server (medium usage). We also use Samba for more menial tasks like smbmounts and file transfers. We thought we were comfortable with Samba. We knew we were comfortable with other types of file servers. OUR SETUP Going from my tired memory: Athlon MP 1.8GHz (mem=nopentium) 2GB ECC SDRAM Tyan S2460(I think?) Antec 450W PS Lots of cooling 5 IBM DeskStar 120GB drives with 8MB caches in RAID 5 3ware 7580(I think?) 8-port hardware RAID 3ware hot-swappable drive cages Intel e1000 Gigabit NIC, full duplex, 1000MBit, autonegotiation off 3com Gigabit switch, autonegotiation off RedHat 7.3 Kernel 2.4.19 with ACL support ext3 with ACL support Samba 2.2.5 with ACL support installed from a recompiled SRPM from the samba.org FTP site. Winbind NO nfs daemon (I hear it's buggy w/ ACLs) We have a variety of clients, from DOS and OS/2 to Windows (9x-2000) and Linux. The server acts as a print spooling area (the actual queues are on an NT server) and scratch area for database programmers to manipulate their flat database files. As far as I know, these files are not commonly accessed by more than one user at a time. THE PROBLEM For the past year, our heaviest-used Netware server has been under more and more stress.. filling up, running out of licenses, slowing down, etc. Preliminary tests using Samba on a fast Linux box showed anywhere from 70% to 1000% speed improvements, depending on the task. The decision was made to switch it to Linux; the whole company is migrating away from Netware and we (as a unit, not speaking for the company) don't want to be completely trapped into Windows if we can help it. The new hardware arrived and more preliminary tests indicated all looked good. We were set to switch last Saturday night. We turned off logins to the Netware box, backed it up, restored it to the new Linux box, set permissions, then made sure the various computers in the building could log in. Yesterday, our first day, was rough. For most of the day we fought random slow browsing with no explanation. Clients would appear to lock up for several seconds. We found some misconfigurations in smb.conf but the problems reappeared. No errors were seen in any machines' logs on debug level 2. I trimmed the smb.conf to a minimal number of options and that seemed to help with the slowness. Today, however, the problem reappeared a few times with no errors in the logs that we could see. The printers were missing some of the records sent to them to print, something that had never happened with Netware. Every time the missing records were different. Occasionally, it would work right. Oplocks (kernel, level I and II) were left to defaults (turned on). THE OUTCOME Sadly, tonight we are installing a Windows NT server. Installing a brand new server is actually cheaper for us than the 8 or so hours of downtime to back up the server, install NT on it, and restore the data to it. We don't want to revert to Netware because so many clients have been reconfigured to log on only to the domain (DOS, OS/2, etc.) and that would require many more hours reversing those changes. Also, some files have been added since leaving Netware. We also decided to proceed to use NT because is more proven in this capacity. CONCLUSION To be fair, the problems could be related to some misconfiguration. I have pasted the smb.conf below. I fear it might just be an oplock problem, but it is not clear what would result if more than one user happened to try to write to a file with them disabled. Every advice we found said to leave them on to prevent corruption and to improve performance. We ran out of time to test it, and feared what failure would bring. Running this: grep -r -B5 -A5 oplock /var/log/samba/ | grep -B5 -A5 error produced only 5 of these errors oplock_break: receive_smb error (Connection reset by peer) from the same DOS machine from 2 days worth of all machines' logs running at debuglevel 1 (some at level 2). I don't know if that is a good indicator of an oplock problem. I can do other greps on request. Unfortunately, we can't test out your suggestions in production, and our off-production testing apparently can't stress it well enough. So please just take this email as input - I'm not looking for answers here, though advice is appreciated. The problem could also have been environment or hardware. We should know soon, as we are going to reinstall the original Samba server with NT, and the problems should reappear if hardware or environment. If we do find that to be true, I will certainly reveal our findings to this mailing list. And perhaps the problem was with ACLs. We couldn't turn them off in production to test that theory. It is likely that we will try Samba in this capacity again in the future with a more mature version. Thanks for listening, /dev/idal [global] server string = workgroup = <our domain> password server = <our PDC> security = domain encrypt passwords = yes smb passwd file = /etc/samba/smbpasswd veto files = /lost+found/ winbind uid = 10000-20000 winbind gid = 10000-20000 winbind separator = + create mask = 660 force create mode = 660 directory mask = 0770 force directory mode = 0770 log file = /var/log/samba/%m.log debuglevel = 2 [print] path = /share/print writeable = yes __________________________________________________ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/ -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba