Re: [ceph-users] RGW - Can't download complete object
The code has been backported and should be part of the firefly 0.80.10 release and the hammer 0.94.2 release. Nathan On 05/14/2015 07:30 AM, Yehuda Sadeh-Weinraub wrote: The code is in wip-11620, abd it's currently on top of the next branch. We'll get it through the tests, then get it into hammer and firefly. I wouldn't recommend installing it in production without proper testing first. Yehuda - Original Message - From: "Sean Sullivan" To: "Yehuda Sadeh-Weinraub" Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 7:22:10 PM Subject: Re: [ceph-users] RGW - Can't download complete object Thank you so much Yahuda! I look forward to testing these. Is there a way for me to pull this code in? Is it in master? On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub wrote: Ok, I dug a bit more, and it seems to me that the problem is with the manifest that was created. I was able to reproduce a similar issue (opened ceph bug #11622), for which I also have a fix. I created new tests to cover this issue, and we'll get those recent fixes as soon as we can, after we test for any regressions. Thanks, Yehuda - Original Message - From: "Yehuda Sadeh-Weinraub" To: "Sean Sullivan" Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 2:33:07 PM Subject: Re: [ceph-users] RGW - Can't download complete object That's another interesting issue. Note that for part 12_80 the manifest specifies (I assume, by the messenger log) this part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') whereas it seems that you do have the original part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 (note the '2/...') The part that the manifest specifies does not exist, which makes me think that there is some weird upload sequence, something like: - client uploads part, upload finishes but client does not get ack for it - client retries (second upload) - client gets ack for the first upload and gives up on the second one But I'm not sure if it would explain the manifest, I'll need to take a look at the code. Could such a sequence happen with the client that you're using to upload? Yehuda - Original Message - From: "Sean Sullivan" To: "Yehuda Sadeh-Weinraub" Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 2:07:22 PM Subject: Re: [ceph-users] RGW - Can't download complete object Sorry for the delay. It took me a while to figure out how to do a range request and append the data to a single file. The good news is that the end file seems to be 14G in size which matches the files manifest size. The bad news is that the file is completely corrupt and the radosgw log has errors. I am using the following code to perform the download:: https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py Here is a clip of the log file:: -- 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 <== osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 (1180387808 0 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12934184960 len=858004 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 <== osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566142 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12145655808 len=4194304 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when trying to read object: -2 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/108 <== osd.21 10.64.64.102:6856/1133473 16 osd_op_reply(74566144 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 304+0+3671316 (1695485150 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=10786701312 len=3671316 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/108 <== osd.82 10.64.64.103:6857/88524 2 osd_op_
Re: [ceph-users] RGW - Can't download complete object
The code is in wip-11620, abd it's currently on top of the next branch. We'll get it through the tests, then get it into hammer and firefly. I wouldn't recommend installing it in production without proper testing first. Yehuda - Original Message - > From: "Sean Sullivan" > To: "Yehuda Sadeh-Weinraub" > Cc: ceph-users@lists.ceph.com > Sent: Wednesday, May 13, 2015 7:22:10 PM > Subject: Re: [ceph-users] RGW - Can't download complete object > > Thank you so much Yahuda! I look forward to testing these. Is there a way > for me to pull this code in? Is it in master? > > > On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub wrote: > > > Ok, I dug a bit more, and it seems to me that the problem is with the > > manifest that was created. I was able to reproduce a similar issue (opened > > ceph bug #11622), for which I also have a fix. > > > > I created new tests to cover this issue, and we'll get those recent fixes > > as soon as we can, after we test for any regressions. > > > > Thanks, > > Yehuda > > > > - Original Message - > > > From: "Yehuda Sadeh-Weinraub" > > > To: "Sean Sullivan" > > > Cc: ceph-users@lists.ceph.com > > > Sent: Wednesday, May 13, 2015 2:33:07 PM > > > Subject: Re: [ceph-users] RGW - Can't download complete object > > > > > > That's another interesting issue. Note that for part 12_80 the manifest > > > specifies (I assume, by the messenger log) this part: > > > > > > > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 > > > (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') > > > > > > whereas it seems that you do have the original part: > > > > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 > > > (note the '2/...') > > > > > > The part that the manifest specifies does not exist, which makes me think > > > that there is some weird upload sequence, something like: > > > > > > - client uploads part, upload finishes but client does not get ack for > > > it > > > - client retries (second upload) > > > - client gets ack for the first upload and gives up on the second one > > > > > > But I'm not sure if it would explain the manifest, I'll need to take a > > > look > > > at the code. Could such a sequence happen with the client that you're > > > using > > > to upload? > > > > > > Yehuda > > > > > > - Original Message - > > > > From: "Sean Sullivan" > > > > To: "Yehuda Sadeh-Weinraub" > > > > Cc: ceph-users@lists.ceph.com > > > > Sent: Wednesday, May 13, 2015 2:07:22 PM > > > > Subject: Re: [ceph-users] RGW - Can't download complete object > > > > > > > > Sorry for the delay. It took me a while to figure out how to do a range > > > > request and append the data to a single file. The good news is that the > > > > end > > > > file seems to be 14G in size which matches the files manifest size. The > > > > bad > > > > news is that the file is completely corrupt and the radosgw log has > > > > errors. > > > > I am using the following code to perform the download:: > > > > > > > > > > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py > > > > > > > > Here is a clip of the log file:: > > > > -- > > > > 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 > > > > <== > > > > osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 > > > > > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 > > > > [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 > > > > (1180387808 0 > > > > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 > > > > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: > > > > io > > > > completion ofs=12934184960 len=858004 > > > > 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 > > > > <== > > > > osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566
Re: [ceph-users] RGW - Can't download complete object
Thank you so much Yahuda! I look forward to testing these. Is there a way for me to pull this code in? Is it in master? On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub wrote: Ok, I dug a bit more, and it seems to me that the problem is with the manifest that was created. I was able to reproduce a similar issue (opened ceph bug #11622), for which I also have a fix. I created new tests to cover this issue, and we'll get those recent fixes as soon as we can, after we test for any regressions. Thanks, Yehuda - Original Message - > From: "Yehuda Sadeh-Weinraub" > To: "Sean Sullivan" > Cc: ceph-users@lists.ceph.com > Sent: Wednesday, May 13, 2015 2:33:07 PM > Subject: Re: [ceph-users] RGW - Can't download complete object > > That's another interesting issue. Note that for part 12_80 the manifest > specifies (I assume, by the messenger log) this part: > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 > (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') > > whereas it seems that you do have the original part: > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 > (note the '2/...') > > The part that the manifest specifies does not exist, which makes me think > that there is some weird upload sequence, something like: > > - client uploads part, upload finishes but client does not get ack for it > - client retries (second upload) > - client gets ack for the first upload and gives up on the second one > > But I'm not sure if it would explain the manifest, I'll need to take a look > at the code. Could such a sequence happen with the client that you're using > to upload? > > Yehuda > > - Original Message ----- > > From: "Sean Sullivan" > > To: "Yehuda Sadeh-Weinraub" > > Cc: ceph-users@lists.ceph.com > > Sent: Wednesday, May 13, 2015 2:07:22 PM > > Subject: Re: [ceph-users] RGW - Can't download complete object > > > > Sorry for the delay. It took me a while to figure out how to do a range > > request and append the data to a single file. The good news is that the end > > file seems to be 14G in size which matches the files manifest size. The bad > > news is that the file is completely corrupt and the radosgw log has errors. > > I am using the following code to perform the download:: > > > > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py > > > > Here is a clip of the log file:: > > -- > > 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 <== > > osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 > > [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 (1180387808 0 > > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 > > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=12934184960 len=858004 > > 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 <== > > osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566142 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 > > [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 > > 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 > > 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=12145655808 len=4194304 > > > > 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when > > trying to read object: -2 > > > > 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/108 <== > > osd.21 10.64.64.102:6856/1133473 16 osd_op_reply(74566144 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 > > [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 304+0+3671316 (1695485150 > > 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 > > 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=10786701312 len=3671316 > > 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/108 <== > > osd.82 10.64.64.103:6857/88524 2 osd_op_reply(74566283 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de6
Re: [ceph-users] RGW - Can't download complete object
Ok, I dug a bit more, and it seems to me that the problem is with the manifest that was created. I was able to reproduce a similar issue (opened ceph bug #11622), for which I also have a fix. I created new tests to cover this issue, and we'll get those recent fixes as soon as we can, after we test for any regressions. Thanks, Yehuda - Original Message - > From: "Yehuda Sadeh-Weinraub" > To: "Sean Sullivan" > Cc: ceph-users@lists.ceph.com > Sent: Wednesday, May 13, 2015 2:33:07 PM > Subject: Re: [ceph-users] RGW - Can't download complete object > > That's another interesting issue. Note that for part 12_80 the manifest > specifies (I assume, by the messenger log) this part: > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 > (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') > > whereas it seems that you do have the original part: > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 > (note the '2/...') > > The part that the manifest specifies does not exist, which makes me think > that there is some weird upload sequence, something like: > > - client uploads part, upload finishes but client does not get ack for it > - client retries (second upload) > - client gets ack for the first upload and gives up on the second one > > But I'm not sure if it would explain the manifest, I'll need to take a look > at the code. Could such a sequence happen with the client that you're using > to upload? > > Yehuda > > - Original Message ----- > > From: "Sean Sullivan" > > To: "Yehuda Sadeh-Weinraub" > > Cc: ceph-users@lists.ceph.com > > Sent: Wednesday, May 13, 2015 2:07:22 PM > > Subject: Re: [ceph-users] RGW - Can't download complete object > > > > Sorry for the delay. It took me a while to figure out how to do a range > > request and append the data to a single file. The good news is that the end > > file seems to be 14G in size which matches the files manifest size. The bad > > news is that the file is completely corrupt and the radosgw log has errors. > > I am using the following code to perform the download:: > > > > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py > > > > Here is a clip of the log file:: > > -- > > 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 <== > > osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 > > [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 (1180387808 0 > > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 > > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=12934184960 len=858004 > > 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 <== > > osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566142 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 > > [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 > > 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 > > 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=12145655808 len=4194304 > > > > 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when > > trying to read object: -2 > > > > 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/108 <== > > osd.21 10.64.64.102:6856/1133473 16 osd_op_reply(74566144 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 > > [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 304+0+3671316 (1695485150 > > 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 > > 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io > > completion ofs=10786701312 len=3671316 > > 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/108 <== > > osd.82 10.64.64.103:6857/88524 2 osd_op_reply(74566283 > > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8 > > [read 0~4194304] v0'0 uv41566 ondisk = 0) v6 303+0+4194304 (1474509283 > > 0 3209869954) 0x7f5
Re: [ceph-users] RGW - Can't download complete object
That's another interesting issue. Note that for part 12_80 the manifest specifies (I assume, by the messenger log) this part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') whereas it seems that you do have the original part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 (note the '2/...') The part that the manifest specifies does not exist, which makes me think that there is some weird upload sequence, something like: - client uploads part, upload finishes but client does not get ack for it - client retries (second upload) - client gets ack for the first upload and gives up on the second one But I'm not sure if it would explain the manifest, I'll need to take a look at the code. Could such a sequence happen with the client that you're using to upload? Yehuda - Original Message - > From: "Sean Sullivan" > To: "Yehuda Sadeh-Weinraub" > Cc: ceph-users@lists.ceph.com > Sent: Wednesday, May 13, 2015 2:07:22 PM > Subject: Re: [ceph-users] RGW - Can't download complete object > > Sorry for the delay. It took me a while to figure out how to do a range > request and append the data to a single file. The good news is that the end > file seems to be 14G in size which matches the files manifest size. The bad > news is that the file is completely corrupt and the radosgw log has errors. > I am using the following code to perform the download:: > > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py > > Here is a clip of the log file:: > -- > 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 <== > osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 > [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 (1180387808 0 > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io > completion ofs=12934184960 len=858004 > 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 <== > osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566142 > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 > [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 > 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 > 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io > completion ofs=12145655808 len=4194304 > > 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when > trying to read object: -2 > > 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/108 <== > osd.21 10.64.64.102:6856/1133473 16 osd_op_reply(74566144 > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 > [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 304+0+3671316 (1695485150 > 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 > 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io > completion ofs=10786701312 len=3671316 > 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/108 <== > osd.82 10.64.64.103:6857/88524 2 osd_op_reply(74566283 > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8 > [read 0~4194304] v0'0 uv41566 ondisk = 0) v6 303+0+4194304 (1474509283 > 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420 > 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io > completion ofs=12917407744 len=4194304 > > I couldn't really find any good documentation on how fragments/files are > layed out on the object file system so I am not sure on where the file will > be. How could the 4mb object have issues but the cluster be completely > health okay? I did do the rados stat of each object inside ceph and they all > appear to be there:: > > http://paste.ubuntu.com/8561/ > > The sum of all of the objects :: 14584887282 > The stat of the object inside ceph:: 14577056082 > > So for some reason I have more data in objects than the key manifest. We > easiliy identified this object via the same method as the other thread I > have:: > > for key in keys: >: if ( key.name == >'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e4
Re: [ceph-users] RGW - Can't download complete object
Sorry for the delay. It took me a while to figure out how to do a range request and append the data to a single file. The good news is that the end file seems to be 14G in size which matches the files manifest size. The bad news is that the file is completely corrupt and the radosgw log has errors. I am using the following code to perform the download:: https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py Here is a clip of the log file:: -- 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 <== osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 (1180387808 0 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12934184960 len=858004 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 <== osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566142 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12145655808 len=4194304 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when trying to read object: -2 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/108 <== osd.21 10.64.64.102:6856/1133473 16 osd_op_reply(74566144 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 304+0+3671316 (1695485150 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=10786701312 len=3671316 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/108 <== osd.82 10.64.64.103:6857/88524 2 osd_op_reply(74566283 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8 [read 0~4194304] v0'0 uv41566 ondisk = 0) v6 303+0+4194304 (1474509283 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12917407744 len=4194304 I couldn't really find any good documentation on how fragments/files are layed out on the object file system so I am not sure on where the file will be. How could the 4mb object have issues but the cluster be completely health okay? I did do the rados stat of each object inside ceph and they all appear to be there:: http://paste.ubuntu.com/8561/ The sum of all of the objects :: 14584887282 The stat of the object inside ceph:: 14577056082 So for some reason I have more data in objects than the key manifest. We easiliy identified this object via the same method as the other thread I have:: for key in keys: : if ( key.name == 'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam' ): : implicit = key.size : explicit = conn.get_bucket(bucket).get_key(key.name).size : absolute = abs(implicit - explicit) : print key.name : print implicit : print explicit : b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam 14578628946 14577056082 So it looks like I have 3 different sizes. I figure this may be the network issue that was mentioned in the other thread but seeing as this is not the first 512k and the overalll size still matches as well as the errors I am seeing in the gateway I feel that this may be a bigger issue. Has anyone seen this before? The only mention of the "got unexpected error when trying to read object" is here (http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-May/021688.html) but my google skills are pretty poor. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW - Can't download complete object
- Original Message - > From: "Sean" > To: ceph-users@lists.ceph.com > Sent: Thursday, May 7, 2015 3:35:14 PM > Subject: [ceph-users] RGW - Can't download complete object > > I have another thread goign on about truncation of objects and I believe > this is a separate but equally bad issue in civetweb/radosgw. My cluster > is completely healthy > > I have one (possibly more) objects stored in ceph rados gateway that > will return a different size every time I Try to download it:: > > http://pastebin.com/hK1iqXZH --- ceph -s > http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object The two interesting things that I see here is: - the multipart upload size for each part is on the big side (is it 1GB for each part?) - it seems that there are a lot of parts that suffered from retries, could be a source for the 512k issue > http://pastebin.com/5TnvgMrX --- python download code > > The weird part is every time I download the file it is of a different > size. I am grabbing the individual objects of the 14g file and will > update this email once I have them all statted out. Currently I am > getting, on average, 1.5G to 2Gb files when the total object should be > 14G in size. > > lacadmin@kh10-9:~$ python corruptpull.py > the download failed. The filesize = 2125988202. The actual size is > 14577056082. Attempts = 1 > the download failed. The filesize = 2071462250. The actual size is > 14577056082. Attempts = 2 > the download failed. The filesize = 2016936298. The actual size is > 14577056082. Attempts = 3 > the download failed. The filesize = 1643643242. The actual size is > 14577056082. Attempts = 4 > the download failed. The filesize = 1597505898. The actual size is > 14577056082. Attempts = 5 > the download failed. The filesize = 2075656554. The actual size is > 14577056082. Attempts = 6 > the download failed. The filesize = 650117482. The actual size is > 14577056082. Attempts = 7 > the download failed. The filesize = 1987576170. The actual size is > 14577056082. Attempts = 8 > the download failed. The filesize = 2109210986. The actual size is > 14577056082. Attempts = 9 > the download failed. The filesize = 2142765418. The actual size is > 14577056082. Attempts = 10 > the download failed. The filesize = 2134376810. The actual size is > 14577056082. Attempts = 11 > the download failed. The filesize = 2146959722. The actual size is > 14577056082. Attempts = 12 > the download failed. The filesize = 2142765418. The actual size is > 14577056082. Attempts = 13 > the download failed. The filesize = 1467482474. The actual size is > 14577056082. Attempts = 14 > the download failed. The filesize = 2046296426. The actual size is > 14577056082. Attempts = 15 > the download failed. The filesize = 2021130602. The actual size is > 14577056082. Attempts = 16 > the download failed. The filesize = 177366. The actual size is > 14577056082. Attempts = 17 > the download failed. The filesize = 2146959722. The actual size is > 14577056082. Attempts = 18 > the download failed. The filesize = 2016936298. The actual size is > 14577056082. Attempts = 19 > the download failed. The filesize = 1983381866. The actual size is > 14577056082. Attempts = 20 > the download failed. The filesize = 2134376810. The actual size is > 14577056082. Attempts = 21 > > Notice it is always different. Once the rados -p .rgw.buckets ls | grep > finishes I will return the listing of objects as well but this is quite > odd and I think this is a separate issue. > > Has anyone seen this before? Why wouldn't radosgw return an error and > why am I getting different file sizes? Usually that means that there was some error in the middle of the download, maybe client to radosgw communication issue. What does the radosgw show when this happens? > > I would post the log from radosgw but I don't see any "err|wrn|fatal" > mentions in the log and the client completes without issue every time. > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW - Can't download complete object
I have another thread goign on about truncation of objects and I believe this is a separate but equally bad issue in civetweb/radosgw. My cluster is completely healthy I have one (possibly more) objects stored in ceph rados gateway that will return a different size every time I Try to download it:: http://pastebin.com/hK1iqXZH --- ceph -s http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object http://pastebin.com/5TnvgMrX --- python download code The weird part is every time I download the file it is of a different size. I am grabbing the individual objects of the 14g file and will update this email once I have them all statted out. Currently I am getting, on average, 1.5G to 2Gb files when the total object should be 14G in size. lacadmin@kh10-9:~$ python corruptpull.py the download failed. The filesize = 2125988202. The actual size is 14577056082. Attempts = 1 the download failed. The filesize = 2071462250. The actual size is 14577056082. Attempts = 2 the download failed. The filesize = 2016936298. The actual size is 14577056082. Attempts = 3 the download failed. The filesize = 1643643242. The actual size is 14577056082. Attempts = 4 the download failed. The filesize = 1597505898. The actual size is 14577056082. Attempts = 5 the download failed. The filesize = 2075656554. The actual size is 14577056082. Attempts = 6 the download failed. The filesize = 650117482. The actual size is 14577056082. Attempts = 7 the download failed. The filesize = 1987576170. The actual size is 14577056082. Attempts = 8 the download failed. The filesize = 2109210986. The actual size is 14577056082. Attempts = 9 the download failed. The filesize = 2142765418. The actual size is 14577056082. Attempts = 10 the download failed. The filesize = 2134376810. The actual size is 14577056082. Attempts = 11 the download failed. The filesize = 2146959722. The actual size is 14577056082. Attempts = 12 the download failed. The filesize = 2142765418. The actual size is 14577056082. Attempts = 13 the download failed. The filesize = 1467482474. The actual size is 14577056082. Attempts = 14 the download failed. The filesize = 2046296426. The actual size is 14577056082. Attempts = 15 the download failed. The filesize = 2021130602. The actual size is 14577056082. Attempts = 16 the download failed. The filesize = 177366. The actual size is 14577056082. Attempts = 17 the download failed. The filesize = 2146959722. The actual size is 14577056082. Attempts = 18 the download failed. The filesize = 2016936298. The actual size is 14577056082. Attempts = 19 the download failed. The filesize = 1983381866. The actual size is 14577056082. Attempts = 20 the download failed. The filesize = 2134376810. The actual size is 14577056082. Attempts = 21 Notice it is always different. Once the rados -p .rgw.buckets ls | grep finishes I will return the listing of objects as well but this is quite odd and I think this is a separate issue. Has anyone seen this before? Why wouldn't radosgw return an error and why am I getting different file sizes? I would post the log from radosgw but I don't see any "err|wrn|fatal" mentions in the log and the client completes without issue every time. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com