Chris de Vidal wrote: > My first post, for reference: > http://marc.theaimsgroup.com/?l=samba&m=103535378916869&w=2 > > When the new NT server's hard drive died, we decided > to keep hobbling along on Samba. Meanwhile, my > supervisor was searching around on OpLock issues on > Google and he saw other people that were having > similar problems. We disabled all OpLocks (kernel, > level I and II, kernel at the global level, level X at > the share) early this morning, and since then things > have been fine! Yesterday and the day before, the > problem appeared quickly, so (knock on wood), I think > we did fix it. Time will tell.
Cool. Now it's probably *way* inconvenient, but it would be great to test thoroughly, then re-enable those oplock settings one at at time to see if the problem comes back again. > * We are using RedHat 7.3 (no ACLs included) but > created a custom kernel (2.4.19) with ext3 ACL support > and installed all of the userland ACL tools. Again, sorry about that: I had gotten confused (ACLs with kernel oplocks) because there was another thread I replied to in which ACLs were being discussed. And I wasn't awake yet. :( > * The corruption was missing records. It would > interrupt the print process and the Opus analysis > indicated hundreds of records were missing. It would > happen in random places in print files (hundreds of > megs to gigs in size), and seldomly would not happen > at all. I still don't understand! Ok, the files are not printed on the Samba host, they are printed through an NT print server, correct? So are you saying that it's files served by Samba that are being sent to the printer, and that's where you're losing data? [ok I just re-read your original post...] You said that the Samba server is used as a "print spooling area". Can you elucidate? It seems you are offering a Samba file share, which is used by another system(s?) for NT's printer spool files. > If it is _officially_ > recognized by the developers as a caveat, it ought to > be put into the docs/manpages. I apologize if it IS > there but I missed it. There are some "dangerous" smb.conf parameters, and AFAIK (maybe not infinitely far ;) the Samba Team have documented that they can be misused in a way that can result in corruption. Did you check the manual page for smb.conf(5), especially for the parameters having to do with locking, to check that you weren't doing anything wrong? > Anyway, it appears to have been fixed. I don't yet > know what kind of performance hit we will see, but so > far, so good. It might not be so bad. Actually, for large database files, it may speed things up quite a bit (and avoid problems) to have the oplocks turned off. This is a "known thing". [again, after re-reading your original message...] Aha, you say that the Samba server is serving flat database files. If those database files are large, this by itself says "turn oplocks off". And this may apply to the files in the share you're using as a print spool area, too. Here is a "sneak preview" excerpt from the second edition of Using Samba, regarding use of oplocks: |Generally, we recommend using the defaults provided by Samba: |standard DOS/Windows deny-mode locks for compatibility and |oplocks for the extra performance that local caching allows. |If your operating system can take advantage of oplocks, it |should provide significant performance improvements. | |One very notable exception is large data files, such as those |used by database software. If a client is allowed to oplock this |kind of file, there is a huge delay while the client copies the |entire file from the server in order to cache it, even though it |may only need to update one record. The situation goes from bad |to worse when another client tries to open the oplocked file. The |first client must write the entire file back to the server before |the second client's file open request can succeed. This results in |another huge delay (for both clients) which in practice often |results in a failed open due to a timeout on the second client, |perhaps along with a message warning of possible database corruption! |You can set veto oplock files, as in the previous example, to avoid |this kind of problem. Just to head off another bunch of comments from the Samba Team, please understand that just because you get a message from Windows that says your database is *possibly* corrupt, it doesn't mean that your database *is* corrupt. OK? ;-) Aside from that, I welcome any comments on the above excerpt. It was suggested by David Collier-Brown, my co-author, and he had to explain it carefully to me before I wrote about it. Any suggestions for improving the discussion are welcome! The comment about the 'veto oplock files' parameter applies when you have just a few files in the share that may be problematic, like a single, huge file in a share, or some number of files with somehow-similar names that can be matched using a file globbing pattern or patterns. > We might reenable kernel then > regular then level2 oplocks later to see if it was > just one particular type. Pretty please! I'm really curious to find out exactly what was happening. Jay Ts