Re: [ceph-users] RGW - Can't download complete object

2015-05-30 Thread Nathan Cutler
The code has been backported and should be part of the firefly 0.80.10 
release and the hammer 0.94.2 release.


Nathan

On 05/14/2015 07:30 AM, Yehuda Sadeh-Weinraub wrote:

The code is in wip-11620, abd it's currently on top of the next branch. We'll 
get it through the tests, then get it into hammer and firefly. I wouldn't 
recommend installing it in production without proper testing first.

Yehuda

- Original Message -

From: "Sean Sullivan" 
To: "Yehuda Sadeh-Weinraub" 
Cc: ceph-users@lists.ceph.com
Sent: Wednesday, May 13, 2015 7:22:10 PM
Subject: Re: [ceph-users] RGW - Can't download complete object

Thank you so much Yahuda! I look forward to testing these. Is there a way
for me to pull this code in? Is it in master?


On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub  wrote:


Ok, I dug a bit more, and it seems to me that the problem is with the
manifest that was created. I was able to reproduce a similar issue (opened
ceph bug #11622), for which I also have a fix.

I created new tests to cover this issue, and we'll get those recent fixes
as soon as we can, after we test for any regressions.

Thanks,
Yehuda

- Original Message -

From: "Yehuda Sadeh-Weinraub" 
To: "Sean Sullivan" 
Cc: ceph-users@lists.ceph.com
Sent: Wednesday, May 13, 2015 2:33:07 PM
Subject: Re: [ceph-users] RGW - Can't download complete object

That's another interesting issue. Note that for part 12_80 the manifest
specifies (I assume, by the messenger log) this part:



default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80

(note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')

whereas it seems that you do have the original part:


default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80

(note the '2/...')

The part that the manifest specifies does not exist, which makes me think
that there is some weird upload sequence, something like:

  - client uploads part, upload finishes but client does not get ack for
  it
  - client retries (second upload)
  - client gets ack for the first upload and gives up on the second one

But I'm not sure if it would explain the manifest, I'll need to take a
look
at the code. Could such a sequence happen with the client that you're
using
to upload?

Yehuda

- Original Message -

From: "Sean Sullivan" 
To: "Yehuda Sadeh-Weinraub" 
Cc: ceph-users@lists.ceph.com
Sent: Wednesday, May 13, 2015 2:07:22 PM
Subject: Re: [ceph-users] RGW - Can't download complete object

Sorry for the delay. It took me a while to figure out how to do a range
request and append the data to a single file. The good news is that the
end
file seems to be 14G in size which matches the files manifest size. The
bad
news is that the file is completely corrupt and the radosgw log has
errors.
I am using the following code to perform the download::



https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py


Here is a clip of the log file::
--
2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108
<==
osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287


default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12

[read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004
(1180387808 0
2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=12934184960 len=858004
2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108
<==
osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142


default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80

[read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6

302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=12145655808 len=4194304

2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error
when
trying to read object: -2

2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108
<==
osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144


default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12

[read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316
(1695485150
0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=10786701312 len=3671316
2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108
<==
osd.82 10.64.64.103:6857/88524 2  osd_op_

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Yehuda Sadeh-Weinraub
The code is in wip-11620, abd it's currently on top of the next branch. We'll 
get it through the tests, then get it into hammer and firefly. I wouldn't 
recommend installing it in production without proper testing first.

Yehuda

- Original Message -
> From: "Sean Sullivan" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, May 13, 2015 7:22:10 PM
> Subject: Re: [ceph-users] RGW - Can't download complete object
> 
> Thank you so much Yahuda! I look forward to testing these. Is there a way
> for me to pull this code in? Is it in master?
> 
> 
> On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub  wrote:
> 
> > Ok, I dug a bit more, and it seems to me that the problem is with the
> > manifest that was created. I was able to reproduce a similar issue (opened
> > ceph bug #11622), for which I also have a fix.
> >
> > I created new tests to cover this issue, and we'll get those recent fixes
> > as soon as we can, after we test for any regressions.
> >
> > Thanks,
> > Yehuda
> >
> > - Original Message -
> > > From: "Yehuda Sadeh-Weinraub" 
> > > To: "Sean Sullivan" 
> > > Cc: ceph-users@lists.ceph.com
> > > Sent: Wednesday, May 13, 2015 2:33:07 PM
> > > Subject: Re: [ceph-users] RGW - Can't download complete object
> > >
> > > That's another interesting issue. Note that for part 12_80 the manifest
> > > specifies (I assume, by the messenger log) this part:
> > >
> > > 
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
> > > (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')
> > >
> > > whereas it seems that you do have the original part:
> > > 
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
> > > (note the '2/...')
> > >
> > > The part that the manifest specifies does not exist, which makes me think
> > > that there is some weird upload sequence, something like:
> > >
> > >  - client uploads part, upload finishes but client does not get ack for
> > >  it
> > >  - client retries (second upload)
> > >  - client gets ack for the first upload and gives up on the second one
> > >
> > > But I'm not sure if it would explain the manifest, I'll need to take a
> > > look
> > > at the code. Could such a sequence happen with the client that you're
> > > using
> > > to upload?
> > >
> > > Yehuda
> > >
> > > - Original Message -
> > > > From: "Sean Sullivan" 
> > > > To: "Yehuda Sadeh-Weinraub" 
> > > > Cc: ceph-users@lists.ceph.com
> > > > Sent: Wednesday, May 13, 2015 2:07:22 PM
> > > > Subject: Re: [ceph-users] RGW - Can't download complete object
> > > >
> > > > Sorry for the delay. It took me a while to figure out how to do a range
> > > > request and append the data to a single file. The good news is that the
> > > > end
> > > > file seems to be 14G in size which matches the files manifest size. The
> > > > bad
> > > > news is that the file is completely corrupt and the radosgw log has
> > > > errors.
> > > > I am using the following code to perform the download::
> > > >
> > > > 
> > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
> > > >
> > > > Here is a clip of the log file::
> > > > --
> > > > 2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108
> > > > <==
> > > > osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287
> > > > 
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
> > > > [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004
> > > > (1180387808 0
> > > > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
> > > > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb:
> > > > io
> > > > completion ofs=12934184960 len=858004
> > > > 2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108
> > > > <==
> > > > osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Sean Sullivan
Thank you so much Yahuda! I look forward to testing these. Is there a way 
for me to pull this code in? Is it in master?



On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub  wrote:

Ok, I dug a bit more, and it seems to me that the problem is with the 
manifest that was created. I was able to reproduce a similar issue (opened 
ceph bug #11622), for which I also have a fix.


I created new tests to cover this issue, and we'll get those recent fixes 
as soon as we can, after we test for any regressions.


Thanks,
Yehuda

- Original Message -
> From: "Yehuda Sadeh-Weinraub" 
> To: "Sean Sullivan" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, May 13, 2015 2:33:07 PM
> Subject: Re: [ceph-users] RGW - Can't download complete object
>
> That's another interesting issue. Note that for part 12_80 the manifest
> specifies (I assume, by the messenger log) this part:
>
> 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80

> (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')
>
> whereas it seems that you do have the original part:
> 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80

> (note the '2/...')
>
> The part that the manifest specifies does not exist, which makes me think
> that there is some weird upload sequence, something like:
>
>  - client uploads part, upload finishes but client does not get ack for it
>  - client retries (second upload)
>  - client gets ack for the first upload and gives up on the second one
>
> But I'm not sure if it would explain the manifest, I'll need to take a look
> at the code. Could such a sequence happen with the client that you're using
> to upload?
>
> Yehuda
>
> - Original Message -----
> > From: "Sean Sullivan" 
> > To: "Yehuda Sadeh-Weinraub" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Wednesday, May 13, 2015 2:07:22 PM
> > Subject: Re: [ceph-users] RGW - Can't download complete object
> >
> > Sorry for the delay. It took me a while to figure out how to do a range
> > request and append the data to a single file. The good news is that the end
> > file seems to be 14G in size which matches the files manifest size. The bad
> > news is that the file is completely corrupt and the radosgw log has errors.
> > I am using the following code to perform the download::
> >
> > 
https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py

> >
> > Here is a clip of the log file::
> > --
> > 2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287
> > 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12

> > [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004 (1180387808 0
> > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
> > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io
> > completion ofs=12934184960 len=858004
> > 2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142
> > 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80

> > [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 
> > 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
> > 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io
> > completion ofs=12145655808 len=4194304
> >
> > 2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when
> > trying to read object: -2
> >
> > 2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144
> > 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12

> > [read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316 (1695485150
> > 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
> > 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io
> > completion ofs=10786701312 len=3671316
> > 2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.82 10.64.64.103:6857/88524 2  osd_op_reply(74566283
> > 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de6

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Yehuda Sadeh-Weinraub
Ok, I dug a bit more, and it seems to me that the problem is with the manifest 
that was created. I was able to reproduce a similar issue (opened ceph bug 
#11622), for which I also have a fix.

I created new tests to cover this issue, and we'll get those recent fixes as 
soon as we can, after we test for any regressions.

Thanks,
Yehuda

- Original Message -
> From: "Yehuda Sadeh-Weinraub" 
> To: "Sean Sullivan" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, May 13, 2015 2:33:07 PM
> Subject: Re: [ceph-users] RGW - Can't download complete object
> 
> That's another interesting issue. Note that for part 12_80 the manifest
> specifies (I assume, by the messenger log) this part:
> 
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
> (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')
> 
> whereas it seems that you do have the original part:
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
> (note the '2/...')
> 
> The part that the manifest specifies does not exist, which makes me think
> that there is some weird upload sequence, something like:
> 
>  - client uploads part, upload finishes but client does not get ack for it
>  - client retries (second upload)
>  - client gets ack for the first upload and gives up on the second one
> 
> But I'm not sure if it would explain the manifest, I'll need to take a look
> at the code. Could such a sequence happen with the client that you're using
> to upload?
> 
> Yehuda
> 
> - Original Message -----
> > From: "Sean Sullivan" 
> > To: "Yehuda Sadeh-Weinraub" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Wednesday, May 13, 2015 2:07:22 PM
> > Subject: Re: [ceph-users] RGW - Can't download complete object
> > 
> > Sorry for the delay. It took me a while to figure out how to do a range
> > request and append the data to a single file. The good news is that the end
> > file seems to be 14G in size which matches the files manifest size. The bad
> > news is that the file is completely corrupt and the radosgw log has errors.
> > I am using the following code to perform the download::
> > 
> > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
> > 
> > Here is a clip of the log file::
> > --
> > 2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
> > [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004 (1180387808 0
> > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
> > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io
> > completion ofs=12934184960 len=858004
> > 2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
> > [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 
> > 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
> > 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io
> > completion ofs=12145655808 len=4194304
> > 
> > 2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when
> > trying to read object: -2
> > 
> > 2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
> > [read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316 (1695485150
> > 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
> > 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io
> > completion ofs=10786701312 len=3671316
> > 2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.82 10.64.64.103:6857/88524 2  osd_op_reply(74566283
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8
> > [read 0~4194304] v0'0 uv41566 ondisk = 0) v6  303+0+4194304 (1474509283
> > 0 3209869954) 0x7f5

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Yehuda Sadeh-Weinraub
That's another interesting issue. Note that for part 12_80 the manifest 
specifies (I assume, by the messenger log) this part:

default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
(note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')

whereas it seems that you do have the original part:
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
(note the '2/...')

The part that the manifest specifies does not exist, which makes me think that 
there is some weird upload sequence, something like:

 - client uploads part, upload finishes but client does not get ack for it
 - client retries (second upload)
 - client gets ack for the first upload and gives up on the second one

But I'm not sure if it would explain the manifest, I'll need to take a look at 
the code. Could such a sequence happen with the client that you're using to 
upload?

Yehuda

- Original Message -
> From: "Sean Sullivan" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, May 13, 2015 2:07:22 PM
> Subject: Re: [ceph-users] RGW - Can't download complete object
> 
> Sorry for the delay. It took me a while to figure out how to do a range
> request and append the data to a single file. The good news is that the end
> file seems to be 14G in size which matches the files manifest size. The bad
> news is that the file is completely corrupt and the radosgw log has errors.
> I am using the following code to perform the download::
> 
> https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
> 
> Here is a clip of the log file::
> --
> 2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
> [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004 (1180387808 0
> 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
> 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=12934184960 len=858004
> 2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
> [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 
> 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
> 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=12145655808 len=4194304
> 
> 2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when
> trying to read object: -2
> 
> 2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
> [read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316 (1695485150
> 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
> 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=10786701312 len=3671316
> 2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> osd.82 10.64.64.103:6857/88524 2  osd_op_reply(74566283
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8
> [read 0~4194304] v0'0 uv41566 ondisk = 0) v6  303+0+4194304 (1474509283
> 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420
> 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=12917407744 len=4194304
> 
> I couldn't really find any good documentation on how fragments/files are
> layed out on the object file system so I am not sure on where the file will
> be. How could the 4mb object have issues but the cluster be completely
> health okay? I did do the rados stat of each object inside ceph and they all
> appear to be there::
> 
> http://paste.ubuntu.com/8561/
> 
> The sum of all of the objects :: 14584887282
> The stat of the object inside ceph:: 14577056082
> 
> So for some reason I have more data in objects than the key manifest. We
> easiliy identified this object via the same method as the other thread I
> have::
> 
> for key in keys:
>: if ( key.name ==
>'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e4

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Sean Sullivan
Sorry for the delay. It took me a while to figure out how to do a range request 
and append the data to a single file. The good news is that the end file seems 
to be 14G in size which matches the files manifest size. The bad news is that 
the file is completely corrupt and the radosgw log has errors. I am using the 
following code to perform the download::

https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py

Here is a clip of the log file::
--
2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108 <== osd.11 
10.64.64.101:6809/942707 5  osd_op_reply(74566287 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
 [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004 (1180387808 0 
2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io 
completion ofs=12934184960 len=858004
2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108 <== osd.45 
10.64.64.101:6845/944590 2  osd_op_reply(74566142 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6  
302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io 
completion ofs=12145655808 len=4194304

2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when 
trying to read object: -2

2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108 <== osd.21 
10.64.64.102:6856/1133473 16  osd_op_reply(74566144 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
 [read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316 (1695485150 0 
3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io 
completion ofs=10786701312 len=3671316
2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108 <== osd.82 
10.64.64.103:6857/88524 2  osd_op_reply(74566283 
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8
 [read 0~4194304] v0'0 uv41566 ondisk = 0) v6  303+0+4194304 (1474509283 0 
3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420
2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io 
completion ofs=12917407744 len=4194304

I couldn't really find any good documentation on how fragments/files are layed 
out on the object file system so I am not sure on where the file will be. How 
could the 4mb object have issues but the cluster be completely health okay? I 
did do the rados stat of each object inside ceph and they all appear to be 
there::

http://paste.ubuntu.com/8561/

The sum of all of the objects :: 14584887282
The stat of the object inside ceph:: 14577056082

So for some reason I have more data in objects than the key manifest. We 
easiliy identified this object via the same method as the other thread I have::

for key in keys:
   : if ( key.name == 
'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam' ):
   : implicit = key.size
   : explicit = conn.get_bucket(bucket).get_key(key.name).size
   : absolute = abs(implicit - explicit)
   : print key.name
   : print implicit
   : print explicit
   :

b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam
14578628946
14577056082

So it looks like I have 3 different sizes. I figure this may be the network 
issue that was mentioned in the other thread but seeing as this is not the 
first 512k and the overalll size still matches as well as the errors I am 
seeing in the gateway I feel that this may be a bigger issue. 

Has anyone seen this before?  The only mention of the "got unexpected error 
when trying to read object" is here 
(http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-May/021688.html) but 
my google skills are pretty poor. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW - Can't download complete object

2015-05-07 Thread Yehuda Sadeh-Weinraub


- Original Message -
> From: "Sean" 
> To: ceph-users@lists.ceph.com
> Sent: Thursday, May 7, 2015 3:35:14 PM
> Subject: [ceph-users] RGW - Can't download complete object
> 
> I have another thread goign on about truncation of objects and I believe
> this is a separate but equally bad issue in civetweb/radosgw. My cluster
> is completely healthy
> 
> I have one (possibly more) objects stored in ceph rados gateway that
> will return a different size every time I Try to download it::
> 
> http://pastebin.com/hK1iqXZH --- ceph -s
> http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object

The two interesting things that I see here is:
 - the multipart upload size for each part is on the big side (is it 1GB for 
each part?)
 - it seems that there are a lot of parts that suffered from retries, could be 
a source for the 512k issue

> http://pastebin.com/5TnvgMrX --- python download code
> 
> The weird part is every time I download the file it is of a different
> size. I am grabbing the individual objects of the 14g file and will
> update this email once I have them all statted out. Currently I am
> getting, on average, 1.5G to 2Gb files when the total object should be
> 14G in size.
> 
> lacadmin@kh10-9:~$ python corruptpull.py
> the download failed. The filesize = 2125988202. The actual size is
> 14577056082. Attempts = 1
> the download failed. The filesize = 2071462250. The actual size is
> 14577056082. Attempts = 2
> the download failed. The filesize = 2016936298. The actual size is
> 14577056082. Attempts = 3
> the download failed. The filesize = 1643643242. The actual size is
> 14577056082. Attempts = 4
> the download failed. The filesize = 1597505898. The actual size is
> 14577056082. Attempts = 5
> the download failed. The filesize = 2075656554. The actual size is
> 14577056082. Attempts = 6
> the download failed. The filesize = 650117482. The actual size is
> 14577056082. Attempts = 7
> the download failed. The filesize = 1987576170. The actual size is
> 14577056082. Attempts = 8
> the download failed. The filesize = 2109210986. The actual size is
> 14577056082. Attempts = 9
> the download failed. The filesize = 2142765418. The actual size is
> 14577056082. Attempts = 10
> the download failed. The filesize = 2134376810. The actual size is
> 14577056082. Attempts = 11
> the download failed. The filesize = 2146959722. The actual size is
> 14577056082. Attempts = 12
> the download failed. The filesize = 2142765418. The actual size is
> 14577056082. Attempts = 13
> the download failed. The filesize = 1467482474. The actual size is
> 14577056082. Attempts = 14
> the download failed. The filesize = 2046296426. The actual size is
> 14577056082. Attempts = 15
> the download failed. The filesize = 2021130602. The actual size is
> 14577056082. Attempts = 16
> the download failed. The filesize = 177366. The actual size is
> 14577056082. Attempts = 17
> the download failed. The filesize = 2146959722. The actual size is
> 14577056082. Attempts = 18
> the download failed. The filesize = 2016936298. The actual size is
> 14577056082. Attempts = 19
> the download failed. The filesize = 1983381866. The actual size is
> 14577056082. Attempts = 20
> the download failed. The filesize = 2134376810. The actual size is
> 14577056082. Attempts = 21
> 
> Notice it is always different. Once the rados -p .rgw.buckets ls | grep
> finishes I will return the listing of objects as well but this is quite
> odd and I think this is a separate issue.
> 
> Has anyone seen this before? Why wouldn't radosgw return an error and
> why am I getting different file sizes?

Usually that means that there was some error in the middle of the download, 
maybe client to radosgw communication issue. What does the radosgw show when 
this happens?

> 
> I would post the log from radosgw but I don't see any "err|wrn|fatal"
> mentions in the log and the client completes without issue every time.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW - Can't download complete object

2015-05-07 Thread Sean
I have another thread goign on about truncation of objects and I believe 
this is a separate but equally bad issue in civetweb/radosgw. My cluster 
is completely healthy


I have one (possibly more) objects stored in ceph rados gateway that 
will return a different size every time I Try to download it::


http://pastebin.com/hK1iqXZH --- ceph -s
http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object
http://pastebin.com/5TnvgMrX --- python download code

The weird part is every time I download the file it is of a different 
size. I am grabbing the individual objects of the 14g file and will 
update this email once I have them all statted out. Currently I am 
getting, on average, 1.5G to 2Gb files when the total object should be 
14G in size.


lacadmin@kh10-9:~$ python corruptpull.py
the download failed. The filesize = 2125988202. The actual size is 
14577056082. Attempts = 1
the download failed. The filesize = 2071462250. The actual size is 
14577056082. Attempts = 2
the download failed. The filesize = 2016936298. The actual size is 
14577056082. Attempts = 3
the download failed. The filesize = 1643643242. The actual size is 
14577056082. Attempts = 4
the download failed. The filesize = 1597505898. The actual size is 
14577056082. Attempts = 5
the download failed. The filesize = 2075656554. The actual size is 
14577056082. Attempts = 6
the download failed. The filesize = 650117482. The actual size is 
14577056082. Attempts = 7
the download failed. The filesize = 1987576170. The actual size is 
14577056082. Attempts = 8
the download failed. The filesize = 2109210986. The actual size is 
14577056082. Attempts = 9
the download failed. The filesize = 2142765418. The actual size is 
14577056082. Attempts = 10
the download failed. The filesize = 2134376810. The actual size is 
14577056082. Attempts = 11
the download failed. The filesize = 2146959722. The actual size is 
14577056082. Attempts = 12
the download failed. The filesize = 2142765418. The actual size is 
14577056082. Attempts = 13
the download failed. The filesize = 1467482474. The actual size is 
14577056082. Attempts = 14
the download failed. The filesize = 2046296426. The actual size is 
14577056082. Attempts = 15
the download failed. The filesize = 2021130602. The actual size is 
14577056082. Attempts = 16
the download failed. The filesize = 177366. The actual size is 
14577056082. Attempts = 17
the download failed. The filesize = 2146959722. The actual size is 
14577056082. Attempts = 18
the download failed. The filesize = 2016936298. The actual size is 
14577056082. Attempts = 19
the download failed. The filesize = 1983381866. The actual size is 
14577056082. Attempts = 20
the download failed. The filesize = 2134376810. The actual size is 
14577056082. Attempts = 21


Notice it is always different. Once the rados -p .rgw.buckets ls | grep 
finishes I will return the listing of objects as well but this is quite 
odd and I think this is a separate issue.


Has anyone seen this before? Why wouldn't radosgw return an error and 
why am I getting different file sizes?


I would post the log from radosgw but I don't see any "err|wrn|fatal" 
mentions in the log and the client completes without issue every time.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com