Re: [Samba] Need help with file corruption issue
> So you are creating files on the server side, access it from > the client side, remove it on the server side again and > create a new file server side under the same name? No, This is much more serious. Please see the strace.txt log. Let me step you through the last bit: 1) Here, I create a file SdLajo6RXt on the share. I read it from the raw disk location and also read it from the mounted location, and it matches. Same! /grid/samba_stress_test/SdLajo6RXt : 0.5406506065286610.5406506065286610.5406506065286610.5406506065286610.540650606528661 /root/grid/samba_stress_test/SdLajo6RXt: 0.5406506065286610.5406506065286610.5406506065286610.5406506065286610.540650606528661 2) Next I delete it unlink("/grid/samba_stress_test/SdLajo6RXt") = 0 3) Next I create a new file **with a different name**, write to it directly on disk, and read it from the samba mount: Different! /grid/samba_stress_test/85fsYXTNhJ : 0.9504576548397450.9504576548397450.9504576548397450.9504576548397450.950457654839745 /root/grid/samba_stress_test/85fsYXTNhJ: 0.5406506065286610.5406506065286610.5406506065286610.5406506065286610.540650606528661 **Note that the NEW file has incorrect content. It matches the OLD, DELETED file.** I double-checked the trace, and the filenames in the trace are all unique. I mounted the share using "forcedirectio" and couldn't get it to repro. I would think that the file name is a part of the key used for caching! Is there some way to get visibility into the caching, so see why it's apparently returning invalid data for a brand new file that it should have *no* data for? > Does the same also happen if you do the file > creation/deletion via Samba as well? It does not. For fun, I self-mapped the share twice and wrote to one mapped share while reading from the other, to simulate 1 client writing and another reading. I was able to repro the issue. I also went ahead and implemented a test where I used winexe to fetch the file from a Windows machine that had the samba share mounted. I was *not* able to repro it. So it's possible that there's something wrong in the Linux cifs module, or it's a race condition and the latencies of doing the remote command to "type C:\path\to\mount\samba_stress_test\random_file" mean I can't repro it. (It's possible that the corrupt files we saw on Windows before were due to something else.) On Mon, Jun 3, 2013 at 7:56 AM, Volker Lendecke wrote: > On Fri, May 31, 2013 at 12:51:40PM -0400, David Coppit wrote: >> Hey Volker, thanks for the reply. >> >> > Can you explain for really stupid people what this does and where the >> > problem is? >> >> Here's what the perl code is doing: >> >> 1) In a loop... >> 1.1) Write a file to the local disk, using a random filename and 5 >> random floats followed by a newline as the content. >> 1.2) chown the file so that the samba mount user can read it >> 1.3) Read that file from a cifs mount of that very same local disk >> location, hosted by samba >> 1.4) Compare the written content versus the read content, exiting if >> they are different. >> 1.5) Delete the temp file > > So you are creating files on the server side, access it from > the client side, remove it on the server side again and > create a new file server side under the same name? I would > really think this is a caching issue, the client does not > notice the file changed. The wireshark trace you sent does > not contain any file related operations, so this time the > client did not even ask the server to close and open the > file again. > > Does the same also happen if you do the file > creation/deletion via Samba as well? > > Volker > > -- > SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen > phone: +49-551-37-0, fax: +49-551-37-9 > AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen > http://www.sernet.de, mailto:kont...@sernet.de -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] Need help with file corruption issue
Hey Volker, thanks for the reply. > Can you explain for really stupid people what this does and where the problem > is? Here's what the perl code is doing: 1) In a loop... 1.1) Write a file to the local disk, using a random filename and 5 random floats followed by a newline as the content. 1.2) chown the file so that the samba mount user can read it 1.3) Read that file from a cifs mount of that very same local disk location, hosted by samba 1.4) Compare the written content versus the read content, exiting if they are different. 1.5) Delete the temp file What I see is that most of the time the samba-provided version of the file is identical, but sometimes it's not. When it's not, the content appears to be the contents of the previously read (and now deleted!) temp file. In some failure cases it's a truncated version of the previously read file. It's definitely not a perl issue since after the script croaks, I can "cat" the file on both the local disk and the samba share and the results are different: # cat /grid/samba_stress_test/85fsYXTNhJ 0.9504576548397450.9504576548397450.9504576548397450.9504576548397450.950457654839745 # cat /root/grid/samba_stress_test/85fsYXTNhJ 0.5406506065286610.5406506065286610.5406506065286610.5406506065286610.540650606528661 > Also, I am a little confused about the scenario In this test, I did a self-mount to rule out Windows. If you want I can try to write this test using Windows instead of the cifs module. But I'm pretty sure it's not the cifs module, since in our real system, we're obviously not doing a self-mount like this. Instead we're mounting the CentOs samba share on a Windows machine. In this case what we're seeing is a failure to unzip a file because it's truncated -- same symptom. What perhaps we didn't notice at the time is that maybe the truncated content that we do get is also wrong -- didn't check this. > It might help if you could send us an strace of that script producing > the error together with a network trace. I did the following: # tshark -p -w wireshark.out port 445 or port 139 # strace perl samba_stress_test.pl > strace.txt 2>&1 Let me know if that's wrong. I'll attach the gzip'd files. Skip past all the successes to see the failure at the very end. On Fri, May 31, 2013 at 2:32 AM, Volker Lendecke wrote: > On Thu, May 30, 2013 at 11:20:24AM -0400, David Coppit wrote: >> Hi all, >> >> I've run into an issue and am wondering if folks can give some advice >> on how to resolve it. >> >> Basically Samba appears to be getting confused, providing some other >> file's contents. >> >> Initially I saw this on a Windows host that has mounted a share from >> CentOs, but I've been able to repro it on the CentOs host using a >> self-mount. > > Sorry, I don't know perl enough to actually see the sequence > of events exactly enough. Can you explain for really stupid > people what this does and where the problem is? It might > help if you could send us an strace of that script producing > the error together with a network trace. > > Also, I am a little confused about the scenario: You are > saying that you saw this on a Windows host that has mounted > a CentOs share? This means that the cifs kernel module is > not involved at all here? > > With best regards, > > Volker Lendecke > > -- > SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen > phone: +49-551-37-0, fax: +49-551-37-9 > AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen > http://www.sernet.de, mailto:kont...@sernet.de -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
[Samba] Need help with file corruption issue
Hi all, I've run into an issue and am wondering if folks can give some advice on how to resolve it. Basically Samba appears to be getting confused, providing some other file's contents. Initially I saw this on a Windows host that has mounted a share from CentOs, but I've been able to repro it on the CentOs host using a self-mount. Here's my test script: #!/usr/bin/perl use File::Temp qw( tempfile ); use strict; $| = 1; my $local_grid_share = '/grid/samba_stress_test'; my $mounted_grid_share = '/root/grid/samba_stress_test'; while (1) { my $content1 = rand() x 5 . "\n"; my ($fh, $filepath) = tempfile( DIR => $local_grid_share ); print $fh $content1; close $fh; system("chown xen $filepath"); my ($filename) = $filepath =~ /.*\/(.*)/; print "\n$filename... "; if (-f "$mounted_grid_share/$filename") { open IN, "$mounted_grid_share/$filename"; local $/ = undef; my $content2 = ; close IN; if ($content1 eq $content2) { print "Same!\n$filepath : $content1$mounted_grid_share/$filename: $content2"; } else { print "Different!\n$filepath : $content1$mounted_grid_share/$filename: $content2"; exit; } } else { print "File is missing!\n"; exit; } unlink $filepath; } Here's the mount command and an illustration of the problem: # ifconfig | grep inet.addr | grep -v 127.0.0.1 inet addr:10.0.0.11 Bcast:10.0.0.255 Mask:255.255.255.0 # mount -t cifs -ousername=the_user,password=the_password //10.0.0.11/grid /root/grid # mkdir /grid/samba_stress_test; chown xen /grid/samba_stress_test # perl samba_stress_test.pl udCVYFNkc5... Same! /grid/samba_stress_test/udCVYFNkc5 : 0.07392498237819470.07392498237819470.07392498237819470.07392498237819470.0739249823781947 /root/grid/samba_stress_test/udCVYFNkc5: 0.07392498237819470.07392498237819470.07392498237819470.07392498237819470.0739249823781947 uETPmRzm99... Different! /grid/samba_stress_test/uETPmRzm99 : 0.9774832438332160.9774832438332160.9774832438332160.9774832438332160.977483243833216 /root/grid/samba_stress_test/uETPmRzm99: 0.07392498237819470.07392498237819470.07392498237819470.07392498237819470.073924982378# So the new file supposedly has the content of the previous *deleted* file. Note that sometimes the content is truncated. (See above -- the "#" for the next prompt is at the end of the previous line because there's no newline). If I re-share the mount that's on the Windows machine, and mount it in this Linux machine, then it consistently repros on the second iteration. With a little effort I can get the file from the Windows machine and compare it, if that's helpful. Here is some information about my setup: # cat /etc/centos-release CentOS release 6.3 (Final) # yum list | grep '^samba' samba.x86_64 3.5.10-125.el6 @base samba-client.x86_64 3.5.10-125.el6 @base samba-common.x86_64 3.5.10-125.el6 @base samba-winbind-clients.x86_64 3.5.10-125.el6 @base samba4-libs.x86_644.0.0-23.alpha11.el6 @base/$releasever samba.x86_64 3.6.9-151.el6base samba-client.x86_64 3.6.9-151.el6base samba-common.i686 3.6.9-151.el6base samba-common.x86_64 3.6.9-151.el6base samba-doc.x86_64 3.6.9-151.el6base samba-domainjoin-gui.x86_64 3.6.9-151.el6base samba-swat.x86_64 3.6.9-151.el6base samba-winbind.x86_64 3.6.9-151.el6base samba-winbind-clients.i6863.6.9-151.el6base samba-winbind-clients.x86_64 3.6.9-151.el6base samba-winbind-devel.i686 3.6.9-151.el6base samba-winbind-devel.x86_643.6.9-151.el6base samba-winbind-krb5-locator.x86_64 3.6.9-151.el6base samba4.x86_64 4.0.0-55.el6.rc4 base samba4-client.x86_64 4.0.0-55.el6.rc4 base samba4-common.x86_64 4.0.0-55.el6.rc4 base samba4-dc.x86_64 4.0.0-55.el6.rc4 base samba4-dc-libs.x86_64 4.0.0-55.el6.rc4 base samba4-devel.i686 4.0.0-23.alpha11.el6 base samba4-devel.x86_64 4.0.0-55.el6.rc4 base samba4-libs.i686 4.0.0-23.alpha11.el6 base samba4-libs.x86_644.0.0-55.el6.rc4 base samba4-pidl.x86_644.0.0-55.el6.rc4 base samba4-python.x86_64 4.0.0-55.el6.rc4 base samba4-swat.x86_644.0.0-55.el6.rc4 base samba4-test.x86_644.0.0-55.el6.rc4 base samba4-winbind.x86_64 4.0.0-55.el6.rc4 base sam