Re: [Samba] Need help with file corruption issue

2013-06-03 Thread David Coppit
> So you are creating files on the server side, access it from
> the client side, remove it on the server side again and
> create a new file server side under the same name?

No, This is much more serious. Please see the strace.txt log. Let me
step you through the last bit:

1) Here, I create a file SdLajo6RXt on the share. I read it from the
raw disk location and also read it from the mounted location, and it
matches.

Same!
/grid/samba_stress_test/SdLajo6RXt :
0.5406506065286610.5406506065286610.5406506065286610.5406506065286610.540650606528661
/root/grid/samba_stress_test/SdLajo6RXt:
0.5406506065286610.5406506065286610.5406506065286610.5406506065286610.540650606528661

2) Next I delete it

unlink("/grid/samba_stress_test/SdLajo6RXt") = 0

3) Next I create a new file **with a different name**, write to it
directly on disk, and read it from the samba mount:

Different!
/grid/samba_stress_test/85fsYXTNhJ :
0.9504576548397450.9504576548397450.9504576548397450.9504576548397450.950457654839745
/root/grid/samba_stress_test/85fsYXTNhJ:
0.5406506065286610.5406506065286610.5406506065286610.5406506065286610.540650606528661

**Note that the NEW file has incorrect content. It matches the OLD,
DELETED file.** I double-checked the trace, and the filenames in the
trace are all unique.

I mounted the share using "forcedirectio" and couldn't get it to repro.

I would think that the file name is a part of the key used for
caching! Is there some way to get visibility into the caching, so see
why it's apparently returning invalid data for a brand new file that
it should have *no* data for?

> Does the same also happen if you do the file
> creation/deletion via Samba as well?

It does not.

For fun, I self-mapped the share twice and wrote to one mapped share
while reading from the other, to simulate 1 client writing and another
reading. I was able to repro the issue.

I also went ahead and implemented a test where I used winexe to fetch
the file from a Windows machine that had the samba share mounted. I
was *not* able to repro it. So it's possible that there's something
wrong in the Linux cifs module, or it's a race condition and the
latencies of doing the remote command to "type
C:\path\to\mount\samba_stress_test\random_file" mean I can't repro it.
(It's possible that the corrupt files we saw on Windows before were
due to something else.)


On Mon, Jun 3, 2013 at 7:56 AM, Volker Lendecke
 wrote:
> On Fri, May 31, 2013 at 12:51:40PM -0400, David Coppit wrote:
>> Hey Volker, thanks for the reply.
>>
>> > Can you explain for really stupid people what this does and where the 
>> > problem is?
>>
>> Here's what the perl code is doing:
>>
>> 1) In a loop...
>> 1.1) Write a file to the local disk, using a random filename and 5
>> random floats followed by a newline as the content.
>> 1.2) chown the file so that the samba mount user can read it
>> 1.3) Read that file from a cifs mount of that very same local disk
>> location, hosted by samba
>> 1.4) Compare the written content versus the read content, exiting if
>> they are different.
>> 1.5) Delete the temp file
>
> So you are creating files on the server side, access it from
> the client side, remove it on the server side again and
> create a new file server side under the same name? I would
> really think this is a caching issue, the client does not
> notice the file changed. The wireshark trace you sent does
> not contain any file related operations, so this time the
> client did not even ask the server to close and open the
> file again.
>
> Does the same also happen if you do the file
> creation/deletion via Samba as well?
>
> Volker
>
> --
> SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
> phone: +49-551-37-0, fax: +49-551-37-9
> AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
> http://www.sernet.de, mailto:kont...@sernet.de
-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba


Re: [Samba] Need help with file corruption issue

2013-06-02 Thread David Coppit
Hey Volker, thanks for the reply.

> Can you explain for really stupid people what this does and where the problem 
> is?

Here's what the perl code is doing:

1) In a loop...
1.1) Write a file to the local disk, using a random filename and 5
random floats followed by a newline as the content.
1.2) chown the file so that the samba mount user can read it
1.3) Read that file from a cifs mount of that very same local disk
location, hosted by samba
1.4) Compare the written content versus the read content, exiting if
they are different.
1.5) Delete the temp file

What I see is that most of the time the samba-provided version of the
file is identical, but sometimes it's not. When it's not, the content
appears to be the contents of the previously read (and now deleted!)
temp file. In some failure cases it's a truncated version of the
previously read file. It's definitely not a perl issue since after the
script croaks, I can "cat" the file on both the local disk and the
samba share and the results are different:

# cat /grid/samba_stress_test/85fsYXTNhJ
0.9504576548397450.9504576548397450.9504576548397450.9504576548397450.950457654839745

# cat /root/grid/samba_stress_test/85fsYXTNhJ
0.5406506065286610.5406506065286610.5406506065286610.5406506065286610.540650606528661

> Also, I am a little confused about the scenario

In this test, I did a self-mount to rule out Windows. If you want I
can try to write this test using Windows instead of the cifs module.
But I'm pretty sure it's not the cifs module, since in our real
system, we're obviously not doing a self-mount like this. Instead
we're mounting the CentOs samba share on a Windows machine. In this
case what we're seeing is a failure to unzip a file because it's
truncated -- same symptom. What perhaps we didn't notice at the time
is that maybe the truncated content that we do get is also wrong --
didn't check this.

> It might help if you could send us an strace of that script producing
> the error together with a network trace.

I did the following:

# tshark -p -w wireshark.out port 445 or port 139
# strace perl samba_stress_test.pl > strace.txt 2>&1

Let me know if that's wrong. I'll attach the gzip'd files. Skip past
all the successes to see the failure at the very end.

On Fri, May 31, 2013 at 2:32 AM, Volker Lendecke
 wrote:
> On Thu, May 30, 2013 at 11:20:24AM -0400, David Coppit wrote:
>> Hi all,
>>
>> I've run into an issue and am wondering if folks can give some advice
>> on how to resolve it.
>>
>> Basically Samba appears to be getting confused, providing some other
>> file's contents.
>>
>> Initially I saw this on a Windows host that has mounted a share from
>> CentOs, but I've been able to repro it on the CentOs host using a
>> self-mount.
>
> Sorry, I don't know perl enough to actually see the sequence
> of events exactly enough. Can you explain for really stupid
> people what this does and where the problem is? It might
> help if you could send us an strace of that script producing
> the error together with a network trace.
>
> Also, I am a little confused about the scenario: You are
> saying that you saw this on a Windows host that has mounted
> a CentOs share? This means that the cifs kernel module is
> not involved at all here?
>
> With best regards,
>
> Volker Lendecke
>
> --
> SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
> phone: +49-551-37-0, fax: +49-551-37-9
> AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
> http://www.sernet.de, mailto:kont...@sernet.de
-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba

[Samba] Need help with file corruption issue

2013-05-30 Thread David Coppit
Hi all,

I've run into an issue and am wondering if folks can give some advice
on how to resolve it.

Basically Samba appears to be getting confused, providing some other
file's contents.

Initially I saw this on a Windows host that has mounted a share from
CentOs, but I've been able to repro it on the CentOs host using a
self-mount.

Here's my test script:

#!/usr/bin/perl

use File::Temp qw( tempfile );
use strict;

$| = 1;

my $local_grid_share = '/grid/samba_stress_test';
my $mounted_grid_share = '/root/grid/samba_stress_test';

while (1) {
  my $content1 = rand() x 5 . "\n";

  my ($fh, $filepath) = tempfile( DIR => $local_grid_share );
  print $fh $content1;
  close $fh;

  system("chown xen $filepath");

  my ($filename) = $filepath =~ /.*\/(.*)/;

  print "\n$filename... ";

  if (-f "$mounted_grid_share/$filename") {
open IN, "$mounted_grid_share/$filename";
local $/ = undef;
my $content2 = ;
close IN;

if ($content1 eq $content2) {
  print "Same!\n$filepath :
$content1$mounted_grid_share/$filename: $content2";
} else {
  print "Different!\n$filepath :
$content1$mounted_grid_share/$filename: $content2";
  exit;
}
  } else {
print "File is missing!\n";
exit;
  }

  unlink $filepath;
}

Here's the mount command and an illustration of the problem:

# ifconfig | grep inet.addr | grep -v 127.0.0.1
  inet addr:10.0.0.11  Bcast:10.0.0.255  Mask:255.255.255.0

# mount -t cifs -ousername=the_user,password=the_password
//10.0.0.11/grid /root/grid

# mkdir /grid/samba_stress_test; chown xen /grid/samba_stress_test

# perl samba_stress_test.pl

udCVYFNkc5... Same!
/grid/samba_stress_test/udCVYFNkc5 :
0.07392498237819470.07392498237819470.07392498237819470.07392498237819470.0739249823781947
/root/grid/samba_stress_test/udCVYFNkc5:
0.07392498237819470.07392498237819470.07392498237819470.07392498237819470.0739249823781947

uETPmRzm99... Different!
/grid/samba_stress_test/uETPmRzm99 :
0.9774832438332160.9774832438332160.9774832438332160.9774832438332160.977483243833216
/root/grid/samba_stress_test/uETPmRzm99:
0.07392498237819470.07392498237819470.07392498237819470.07392498237819470.073924982378#

So the new file supposedly has the content of the previous *deleted*
file. Note that sometimes the content is truncated. (See above -- the
"#" for the next prompt is at the end of the previous line because
there's no newline).

If I re-share the mount that's on the Windows machine, and mount it in
this Linux machine, then it consistently repros on the second
iteration. With a little effort I can get the file from the Windows
machine and compare it, if that's helpful.

Here is some information about my setup:

# cat /etc/centos-release
CentOS release 6.3 (Final)

# yum list | grep '^samba'
samba.x86_64  3.5.10-125.el6   @base
samba-client.x86_64   3.5.10-125.el6   @base
samba-common.x86_64   3.5.10-125.el6   @base
samba-winbind-clients.x86_64  3.5.10-125.el6   @base
samba4-libs.x86_644.0.0-23.alpha11.el6 @base/$releasever
samba.x86_64  3.6.9-151.el6base
samba-client.x86_64   3.6.9-151.el6base
samba-common.i686 3.6.9-151.el6base
samba-common.x86_64   3.6.9-151.el6base
samba-doc.x86_64  3.6.9-151.el6base
samba-domainjoin-gui.x86_64   3.6.9-151.el6base
samba-swat.x86_64 3.6.9-151.el6base
samba-winbind.x86_64  3.6.9-151.el6base
samba-winbind-clients.i6863.6.9-151.el6base
samba-winbind-clients.x86_64  3.6.9-151.el6base
samba-winbind-devel.i686  3.6.9-151.el6base
samba-winbind-devel.x86_643.6.9-151.el6base
samba-winbind-krb5-locator.x86_64 3.6.9-151.el6base
samba4.x86_64 4.0.0-55.el6.rc4 base
samba4-client.x86_64  4.0.0-55.el6.rc4 base
samba4-common.x86_64  4.0.0-55.el6.rc4 base
samba4-dc.x86_64  4.0.0-55.el6.rc4 base
samba4-dc-libs.x86_64 4.0.0-55.el6.rc4 base
samba4-devel.i686 4.0.0-23.alpha11.el6 base
samba4-devel.x86_64   4.0.0-55.el6.rc4 base
samba4-libs.i686  4.0.0-23.alpha11.el6 base
samba4-libs.x86_644.0.0-55.el6.rc4 base
samba4-pidl.x86_644.0.0-55.el6.rc4 base
samba4-python.x86_64  4.0.0-55.el6.rc4 base
samba4-swat.x86_644.0.0-55.el6.rc4 base
samba4-test.x86_644.0.0-55.el6.rc4 base
samba4-winbind.x86_64 4.0.0-55.el6.rc4 base
sam