Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi,

> Can't make much out of it, will probably need rgw logs (and preferably
> with also 'debug ms = 1') for this issue.

Well, the problem is that I can't make it happen again ... it happened
4 times during an import of ~3000 files ... I'm trying to reproduce
this on a test cluster but so far, no luck. I'll give it another shot
tomorrow.

And I can't enable debug on prod for long periods, the space for log
is limited and would be filled in minutes with all the requests. I
also disabled the use of copy in production anyway because I can't
have it corrupt random customer files.


Cheers,

Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Yehuda Sadeh
On Mon, Mar 18, 2013 at 7:40 AM, Sylvain Munaut
 wrote:
> Hi,
>
>
>> What version are you using? Do you have logs?
>
> I'm running a custom build 0.56.3 + some patches ( basically up
> to7889c5412 + fixes for #4150 and #4177 ).
>
> I don't have any radosgw low  ( debug level is set to 0 and it didn't
> output anything ).
> I have the HTTP logs :
>
> 10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +] "PUT
> /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
> HTTP/1.1" 200 0 "-" "Boto/2.6.0 (linux2)"
> 10.0.0.74 s3.svc - [14/Mar/2013:09:23:14 +] "GET
> /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3D&Expires=1363256594&AWSAccessKeyId=XXX
> HTTP/1.1" 200 622080 "-" "python-requests"
> 10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +] "PUT
> /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
> HTTP/1.1" 200 146 "-" "Boto/2.6.0 (linux2)"
> 10.0.0.74 s3.svc - [14/Mar/2013:10:14:53 +] "GET
> /rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3D&Expires=1363258236&AWSAccessKeyId=XXX
> HTTP/1.1" 200 461220 "-" "python-requests"
>
>
Can't make much out of it, will probably need rgw logs (and preferably
with also 'debug ms = 1') for this issue.

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi,


> What version are you using? Do you have logs?

I'm running a custom build 0.56.3 + some patches ( basically up
to7889c5412 + fixes for #4150 and #4177 ).

I don't have any radosgw low  ( debug level is set to 0 and it didn't
output anything ).
I have the HTTP logs :

10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +] "PUT
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
HTTP/1.1" 200 0 "-" "Boto/2.6.0 (linux2)"
10.0.0.74 s3.svc - [14/Mar/2013:09:23:14 +] "GET
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3D&Expires=1363256594&AWSAccessKeyId=XXX
HTTP/1.1" 200 622080 "-" "python-requests"
10.0.0.253 s3.svc - [14/Mar/2013:09:23:14 +] "PUT
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34
HTTP/1.1" 200 146 "-" "Boto/2.6.0 (linux2)"
10.0.0.74 s3.svc - [14/Mar/2013:10:14:53 +] "GET
/rb/138e6898a8039db16df2146398626f0303ae3e97427fdad33c95b6034f690b34?Signature=XXX%3D&Expires=1363258236&AWSAccessKeyId=XXX
HTTP/1.1" 200 461220 "-" "python-requests"


Cheers,

   Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Yehuda Sadeh
On Mon, Mar 18, 2013 at 2:50 AM, Sylvain Munaut
 wrote:
> Hi,
>
>
> I've just noticed something rather worrying on our cluster.
>
> Some files are apparently truncated. From the first look I had at it,
> it happened on files where there was a metadata update right after the
> file was stored. The exact sequence was:
>
>  - PUT to store the file
>  - GET to get the file (which at that point is still correct and has
> the proper length)
>  - PUT using a 'copy source' over itself to update the metadata
>
> all of theses happening sequentially in the same second, very quickly.
>
> Then subsequent GET return a truncated file.
>
>
> I'm looking into it to narrow down the issue but I wanted to know if
> anyone had seen something similar ?
>
>
What version are you using? Do you have logs?

Thanks,
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi,


I've just noticed something rather worrying on our cluster.

Some files are apparently truncated. From the first look I had at it,
it happened on files where there was a metadata update right after the
file was stored. The exact sequence was:

 - PUT to store the file
 - GET to get the file (which at that point is still correct and has
the proper length)
 - PUT using a 'copy source' over itself to update the metadata

all of theses happening sequentially in the same second, very quickly.

Then subsequent GET return a truncated file.


I'm looking into it to narrow down the issue but I wanted to know if
anyone had seen something similar ?


Cheers,

 Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html