Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Yehuda Sadeh
On Mon, Mar 18, 2013 at 2:50 AM, Sylvain Munaut
s.mun...@whatever-company.com wrote:
 Hi,


 I've just noticed something rather worrying on our cluster.

 Some files are apparently truncated. From the first look I had at it,
 it happened on files where there was a metadata update right after the
 file was stored. The exact sequence was:

  - PUT to store the file
  - GET to get the file (which at that point is still correct and has
 the proper length)
  - PUT using a 'copy source' over itself to update the metadata

 all of theses happening sequentially in the same second, very quickly.

 Then subsequent GET return a truncated file.


 I'm looking into it to narrow down the issue but I wanted to know if
 anyone had seen something similar ?


What version are you using? Do you have logs?

Thanks,
Yehuda
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi,


 What version are you using? Do you have logs?

I'm running a custom build 0.56.3 + some patches ( basically up
to7889c5412 + fixes for #4150 and #4177 ).

I don't have any radosgw low  ( debug level is set to 0 and it didn't
output anything ).
I have the HTTP logs :

10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +] PUT
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
HTTP/1.1 200 0 - Boto/2.6.0 (linux2)
10.0.0.74 s3.svc - [14/Mar/2013:09:23:14 +] GET
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3DExpires=1363256594AWSAccessKeyId=XXX
HTTP/1.1 200 622080 - python-requests
10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +] PUT
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
HTTP/1.1 200 146 - Boto/2.6.0 (linux2)
10.0.0.74 s3.svc - [14/Mar/2013:10:14:53 +] GET
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3DExpires=1363258236AWSAccessKeyId=XXX
HTTP/1.1 200 461220 - python-requests


Cheers,

   Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Yehuda Sadeh
On Mon, Mar 18, 2013 at 7:40 AM, Sylvain Munaut
s.mun...@whatever-company.com wrote:
 Hi,


 What version are you using? Do you have logs?

 I'm running a custom build 0.56.3 + some patches ( basically up
 to7889c5412 + fixes for #4150 and #4177 ).

 I don't have any radosgw low  ( debug level is set to 0 and it didn't
 output anything ).
 I have the HTTP logs :

 10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +] PUT
 /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
 HTTP/1.1 200 0 - Boto/2.6.0 (linux2)
 10.0.0.74 s3.svc - [14/Mar/2013:09:23:14 +] GET
 /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3DExpires=1363256594AWSAccessKeyId=XXX
 HTTP/1.1 200 622080 - python-requests
 10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +] PUT
 /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
 HTTP/1.1 200 146 - Boto/2.6.0 (linux2)
 10.0.0.74 s3.svc - [14/Mar/2013:10:14:53 +] GET
 /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3DExpires=1363258236AWSAccessKeyId=XXX
 HTTP/1.1 200 461220 - python-requests


Can't make much out of it, will probably need rgw logs (and preferably
with also 'debug ms = 1') for this issue.

Yehuda
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi,

 Can't make much out of it, will probably need rgw logs (and preferably
 with also 'debug ms = 1') for this issue.

Well, the problem is that I can't make it happen again ... it happened
4 times during an import of ~3000 files ... I'm trying to reproduce
this on a test cluster but so far, no luck. I'll give it another shot
tomorrow.

And I can't enable debug on prod for long periods, the space for log
is limited and would be filled in minutes with all the requests. I
also disabled the use of copy in production anyway because I can't
have it corrupt random customer files.


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html