Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-09 Thread Chris Friesen

On 06/08/2015 06:26 PM, Robert Collins wrote:

On 9 June 2015 at 07:53, Chris Friesen  wrote:

 From what I understand, the iterator (in the glance-api process) normally
breaks out of the while loop once the whole file has been read and the
read() call returns an empty string.

It's not clear to me how an error in the nova process (which causes it to
stop reading the file from glance-api)  will cause glance-api to break out
of the while loop in ChunkedFile.__iter__().


AIUI the conclusion of our IRC investigation was:
  - with caching middleware, the fd is freed, just after ~4m.
  - without caching middleware, the fd is freed after ~90s.


Correct. I went back and tried it out in a hardware lab and those are the 
results I got.  Sorry for the noise (though it'd be nice to get rid of the 
4-minute delay).


I did a bit more digging and we've apparently got another similar issue reported 
where taking a snapshot fails because the filesystem fills up and the space 
never gets reclaimed even though the snapshot failed.  I'll make sure I can 
reproduce that one before going further.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-08 Thread Robert Collins
On 9 June 2015 at 07:53, Chris Friesen  wrote:
> On 06/08/2015 12:30 PM, Robert Collins wrote:
>>
>> On 9 June 2015 at 03:50, Chris Friesen 
>> wrote:
>>>
>>> On 06/07/2015 04:22 PM, Robert Collins wrote:
>>
>>
>>> Hi, original reporter here.
>>>
>>> There's no LB involved.  The issue was noticed in a test lab that is
>>> tight
>>> on disk space.  When an instance failed to boot the person using the lab
>>> tried to delete some images to free up space, at which point it was
>>> noticed
>>> that space didn't actually free up.  (For at least half an hour, exact
>>> time
>>> unknown.)
>>>
>>> I'm more of a nova guy, so could you elaborate a bit on the GC?  Is
>>> something going to delete the ChunkedFile object after a certain amount
>>> of
>>> inactivity? What triggers the GC to run?
>>
>>
>> Ok, updating my theory...  I'm not super familiar with glances
>> innards, so I'm going to call out my assumptions.
>>
>> The ChunkedFile object is in the Nova process. It reads from a local
>> file, so its the issue - and it has a handle onto the image because
>> glance arranged for it to read from it.
>
>
> As I understand it, the ChunkedFile object is in the glance-api process.
> (At least, it's the glance-api process that is holding the open file
> descriptor.)
>
>
>> Anyhow - to answer your question: the iterator is only referenced by
>> the for loop, nothing else *can* hold a reference to it (without nasty
>> introspection hacks) and so as long as the iterator has an appropriate
>> try:finally:, which the filesystem ChunkedFile one does- the file will
>> be closed automatically.
>
>
> From what I understand, the iterator (in the glance-api process) normally
> breaks out of the while loop once the whole file has been read and the
> read() call returns an empty string.
>
> It's not clear to me how an error in the nova process (which causes it to
> stop reading the file from glance-api)  will cause glance-api to break out
> of the while loop in ChunkedFile.__iter__().

AIUI the conclusion of our IRC investigation was:
 - with caching middleware, the fd is freed, just after ~4m.
 - without caching middleware, the fd is freed after ~90s.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-08 Thread Chris Friesen

On 06/08/2015 12:30 PM, Robert Collins wrote:

On 9 June 2015 at 03:50, Chris Friesen  wrote:

On 06/07/2015 04:22 PM, Robert Collins wrote:



Hi, original reporter here.

There's no LB involved.  The issue was noticed in a test lab that is tight
on disk space.  When an instance failed to boot the person using the lab
tried to delete some images to free up space, at which point it was noticed
that space didn't actually free up.  (For at least half an hour, exact time
unknown.)

I'm more of a nova guy, so could you elaborate a bit on the GC?  Is
something going to delete the ChunkedFile object after a certain amount of
inactivity? What triggers the GC to run?


Ok, updating my theory...  I'm not super familiar with glances
innards, so I'm going to call out my assumptions.

The ChunkedFile object is in the Nova process. It reads from a local
file, so its the issue - and it has a handle onto the image because
glance arranged for it to read from it.


As I understand it, the ChunkedFile object is in the glance-api process.  (At 
least, it's the glance-api process that is holding the open file descriptor.)




Anyhow - to answer your question: the iterator is only referenced by
the for loop, nothing else *can* hold a reference to it (without nasty
introspection hacks) and so as long as the iterator has an appropriate
try:finally:, which the filesystem ChunkedFile one does- the file will
be closed automatically.


From what I understand, the iterator (in the glance-api process) normally 
breaks out of the while loop once the whole file has been read and the read() 
call returns an empty string.


It's not clear to me how an error in the nova process (which causes it to stop 
reading the file from glance-api)  will cause glance-api to break out of the 
while loop in ChunkedFile.__iter__().


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-08 Thread Robert Collins
On 9 June 2015 at 03:50, Chris Friesen  wrote:
> On 06/07/2015 04:22 PM, Robert Collins wrote:

> Hi, original reporter here.
>
> There's no LB involved.  The issue was noticed in a test lab that is tight
> on disk space.  When an instance failed to boot the person using the lab
> tried to delete some images to free up space, at which point it was noticed
> that space didn't actually free up.  (For at least half an hour, exact time
> unknown.)
>
> I'm more of a nova guy, so could you elaborate a bit on the GC?  Is
> something going to delete the ChunkedFile object after a certain amount of
> inactivity? What triggers the GC to run?

Ok, updating my theory...  I'm not super familiar with glances
innards, so I'm going to call out my assumptions.

The ChunkedFile object is in the Nova process. It reads from a local
file, so its the issue - and it has a handle onto the image because
glance arranged for it to read from it.

Nova runs out of space and throws in the middle of iterating the ChunkedFile.

the code in https://review.openstack.org/#/c/188179/1/nova/image/glance.py
(base) propogates that exception, hitting the
 finally:
 if close_file:
data.close()

on the way out.

the image_chunks iterator, which is being iterated by the for loop,
will be closed when the for loop is left because there are no other
references to the iterator object.

So the in-nova ChunkedFile object will have its close method called
due to the close() in the finally block of the __iter__ method
automatically when you except out of that exception. (demo below).
Assumption: glance_store/_drivers/filesystem.py is the ChunkedFile in
question; it might not be. If e.g. you are reading from the glance
HTTP service, then it might be a different class.

Anyhow - to answer your question: the iterator is only referenced by
the for loop, nothing else *can* hold a reference to it (without nasty
introspection hacks) and so as long as the iterator has an appropriate
try:finally:, which the filesystem ChunkedFile one does- the file will
be closed automatically.

So things to verify:
 - it is filesystem.ChunkedFile that you're seeing when you reproduce
it (within Nova). If not, what class is it :).

If its an HTTP handle of some sort, then we just need to make sure it
has an appropriate try:finally: within its generator. Assuming that is
that the contract with glance client is 'and you get an iterator' not
a 'file like object'.

>>> class f:
...   def __iter__(self):
... try:
...   yield 1
...   1/0
... finally:
...   print('yay')
...
>>> for x in f():
...   pass
...
yay
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 5, in __iter__
ZeroDivisionError: integer division or modulo by zero
>>> for x in f():
...   raise Exception()
...
yay
Traceback (most recent call last):
  File "", line 2, in 
Exception


-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-08 Thread Chris Friesen

On 06/07/2015 04:22 PM, Robert Collins wrote:

On 6 June 2015 at 13:08, Ian Cordasco  wrote:



So the problem is with how we use ResponseSerializer and the ChunkedFile
(https://git.openstack.org/cgit/openstack/glance/tree/glance/api/v2/image_d
ata.py#n222). I think the problem we'll have is that webob provides
nothing on a Response
(https://webob.readthedocs.org/en/latest/modules/webob.html#response) to
hook into so we can close the ChunkedFile.

I wonder if we used the body_file attribute if webob would close the file
when the response is closed (because I'm assuming that nova/glanceclient
are closing the response with which it's downloading the data).


But the maximum leak time is a single GC run, which we don't expect to
be long, unless the server is super quiet (and if it is, the temporary
leak is less like to be an issue, no?).

I wonder if there's actually something else going on here. E.g. a
broken LB in front of glance-api which is preventing the HTTP
connection termination from being detected, and the thread is staying
open-and-stalled. That would explain the symptoms just as well - and
is deployer specific so could also explain the (perceived) trickiness
in reproduction.


Hi, original reporter here.

There's no LB involved.  The issue was noticed in a test lab that is tight on 
disk space.  When an instance failed to boot the person using the lab tried to 
delete some images to free up space, at which point it was noticed that space 
didn't actually free up.  (For at least half an hour, exact time unknown.)


I'm more of a nova guy, so could you elaborate a bit on the GC?  Is something 
going to delete the ChunkedFile object after a certain amount of inactivity? 
What triggers the GC to run?


Thanks,
Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-07 Thread Robert Collins
On 6 June 2015 at 13:08, Ian Cordasco  wrote:
>
>
> On 6/5/15, 02:55, "Flavio Percoco"  wrote:
>
>>On 04/06/15 11:46 -0600, Chris Friesen wrote:
>>>On 06/04/2015 03:01 AM, Flavio Percoco wrote:
On 03/06/15 16:46 -0600, Chris Friesen wrote:
>We recently ran into an issue where nova couldn't write an image file
>due to
>lack of space and so just quit reading from glance.
>
>This caused glance to be stuck with an open file descriptor, which
>meant that
>the image consumed space even after it was deleted.
>
>I have a crude fix for nova at
>"https://review.openstack.org/#/c/188179/";
>which basically continues to read the image even though it can't write
>it.
>That seems less than ideal for large images though.
>
>Is there a better way to do this?  Is there a way for nova to indicate
>to
>glance that it's no longer interested in that image and glance can
>close the
>file?
>
>If I've followed this correctly, on the glance side I think the code in
>question is ultimately
>glance_store._drivers.filesystem.ChunkedFile.__iter__().

Actually, to be honest, I was quite confused by the email :P

Correct me if I still didn't understand what you're asking.

You ran out of space on the Nova side while downloading the image and
there's a file descriptor leak somewhere either in that lovely (sarcasm)
glance wrapper or in glanceclient.
>>>
>>>The first part is correct, but the file descriptor is actually held by
>>>glance-api.
>>>
Just by reading your email and glancing your patch, I believe the bug
might be in glanceclient but I'd need to five into this. The piece of
code you'll need to look into is[0].

glance_store is just used server side. If that's what you meant -
glance is keeping the request and the ChunkedFile around - then yes,
glance_store is the place to look into.

[0]
https://github.com/openstack/python-glanceclient/blob/master/glanceclien
t/v1/images.py#L152
>>>
>>>I believe what's happening is that the ChunkedFile code opens the file
>>>and creates the iterator.  Nova then starts iterating through the
>>>file.
>>>
>>>If nova (or any other user of glance) iterates all the way through the
>>>file then the ChunkedFile code will hit the "finally" clause in
>>>__iter__() and close the file descriptor.
>>>
>>>If nova starts iterating through the file and then stops (due to
>>>running out of room, for example), the ChunkedFile.__iter__() routine
>>>is left with an open file descriptor.  At this point deleting the
>>>image will not actually free up any space.
>>>
>>>I'm not a glance guy so I could be wrong about the code.  The
>>>externally-visible data are:
>>>1) glance-api is holding an open file descriptor to a deleted image file
>>>2) If I kill glance-api the disk space is freed up.
>>>3) If I modify nova to always finish iterating through the file the
>>>problem doesn't occur in the first place.
>>
>>Gotcha, thanks for explaining. I think the problem is that there might
>>be a reference leak and therefore the FD is kept opened. Probably the
>>request interruption is not getting to the driver. I've filed this
>>bug[0] so we can look into it.
>>
>>[0] https://bugs.launchpad.net/glance-store/+bug/1462235
>>
>>Flavio
>>
>>--
>>@flaper87
>>Flavio Percoco
>
> So the problem is with how we use ResponseSerializer and the ChunkedFile
> (https://git.openstack.org/cgit/openstack/glance/tree/glance/api/v2/image_d
> ata.py#n222). I think the problem we'll have is that webob provides
> nothing on a Response
> (https://webob.readthedocs.org/en/latest/modules/webob.html#response) to
> hook into so we can close the ChunkedFile.
>
> I wonder if we used the body_file attribute if webob would close the file
> when the response is closed (because I'm assuming that nova/glanceclient
> are closing the response with which it's downloading the data).

But the maximum leak time is a single GC run, which we don't expect to
be long, unless the server is super quiet (and if it is, the temporary
leak is less like to be an issue, no?).

I wonder if there's actually something else going on here. E.g. a
broken LB in front of glance-api which is preventing the HTTP
connection termination from being detected, and the thread is staying
open-and-stalled. That would explain the symptoms just as well - and
is deployer specific so could also explain the (perceived) trickiness
in reproduction.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-05 Thread Ian Cordasco


On 6/5/15, 02:55, "Flavio Percoco"  wrote:

>On 04/06/15 11:46 -0600, Chris Friesen wrote:
>>On 06/04/2015 03:01 AM, Flavio Percoco wrote:
>>>On 03/06/15 16:46 -0600, Chris Friesen wrote:
We recently ran into an issue where nova couldn't write an image file
due to
lack of space and so just quit reading from glance.

This caused glance to be stuck with an open file descriptor, which
meant that
the image consumed space even after it was deleted.

I have a crude fix for nova at
"https://review.openstack.org/#/c/188179/";
which basically continues to read the image even though it can't write
it.
That seems less than ideal for large images though.

Is there a better way to do this?  Is there a way for nova to indicate
to
glance that it's no longer interested in that image and glance can
close the
file?

If I've followed this correctly, on the glance side I think the code in
question is ultimately
glance_store._drivers.filesystem.ChunkedFile.__iter__().
>>>
>>>Actually, to be honest, I was quite confused by the email :P
>>>
>>>Correct me if I still didn't understand what you're asking.
>>>
>>>You ran out of space on the Nova side while downloading the image and
>>>there's a file descriptor leak somewhere either in that lovely (sarcasm)
>>>glance wrapper or in glanceclient.
>>
>>The first part is correct, but the file descriptor is actually held by
>>glance-api.
>>
>>>Just by reading your email and glancing your patch, I believe the bug
>>>might be in glanceclient but I'd need to five into this. The piece of
>>>code you'll need to look into is[0].
>>>
>>>glance_store is just used server side. If that's what you meant -
>>>glance is keeping the request and the ChunkedFile around - then yes,
>>>glance_store is the place to look into.
>>>
>>>[0]
>>>https://github.com/openstack/python-glanceclient/blob/master/glanceclien
>>>t/v1/images.py#L152
>>
>>I believe what's happening is that the ChunkedFile code opens the file
>>and creates the iterator.  Nova then starts iterating through the
>>file.
>>
>>If nova (or any other user of glance) iterates all the way through the
>>file then the ChunkedFile code will hit the "finally" clause in
>>__iter__() and close the file descriptor.
>>
>>If nova starts iterating through the file and then stops (due to
>>running out of room, for example), the ChunkedFile.__iter__() routine
>>is left with an open file descriptor.  At this point deleting the
>>image will not actually free up any space.
>>
>>I'm not a glance guy so I could be wrong about the code.  The
>>externally-visible data are:
>>1) glance-api is holding an open file descriptor to a deleted image file
>>2) If I kill glance-api the disk space is freed up.
>>3) If I modify nova to always finish iterating through the file the
>>problem doesn't occur in the first place.
>
>Gotcha, thanks for explaining. I think the problem is that there might
>be a reference leak and therefore the FD is kept opened. Probably the
>request interruption is not getting to the driver. I've filed this
>bug[0] so we can look into it.
>
>[0] https://bugs.launchpad.net/glance-store/+bug/1462235
>
>Flavio
>
>-- 
>@flaper87
>Flavio Percoco

So the problem is with how we use ResponseSerializer and the ChunkedFile
(https://git.openstack.org/cgit/openstack/glance/tree/glance/api/v2/image_d
ata.py#n222). I think the problem we'll have is that webob provides
nothing on a Response
(https://webob.readthedocs.org/en/latest/modules/webob.html#response) to
hook into so we can close the ChunkedFile.

I wonder if we used the body_file attribute if webob would close the file
when the response is closed (because I'm assuming that nova/glanceclient
are closing the response with which it's downloading the data).

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-05 Thread Flavio Percoco

On 04/06/15 11:46 -0600, Chris Friesen wrote:

On 06/04/2015 03:01 AM, Flavio Percoco wrote:

On 03/06/15 16:46 -0600, Chris Friesen wrote:

We recently ran into an issue where nova couldn't write an image file due to
lack of space and so just quit reading from glance.

This caused glance to be stuck with an open file descriptor, which meant that
the image consumed space even after it was deleted.

I have a crude fix for nova at "https://review.openstack.org/#/c/188179/";
which basically continues to read the image even though it can't write it.
That seems less than ideal for large images though.

Is there a better way to do this?  Is there a way for nova to indicate to
glance that it's no longer interested in that image and glance can close the
file?

If I've followed this correctly, on the glance side I think the code in
question is ultimately glance_store._drivers.filesystem.ChunkedFile.__iter__().


Actually, to be honest, I was quite confused by the email :P

Correct me if I still didn't understand what you're asking.

You ran out of space on the Nova side while downloading the image and
there's a file descriptor leak somewhere either in that lovely (sarcasm)
glance wrapper or in glanceclient.


The first part is correct, but the file descriptor is actually held by 
glance-api.


Just by reading your email and glancing your patch, I believe the bug
might be in glanceclient but I'd need to five into this. The piece of
code you'll need to look into is[0].

glance_store is just used server side. If that's what you meant -
glance is keeping the request and the ChunkedFile around - then yes,
glance_store is the place to look into.

[0]
https://github.com/openstack/python-glanceclient/blob/master/glanceclient/v1/images.py#L152


I believe what's happening is that the ChunkedFile code opens the file 
and creates the iterator.  Nova then starts iterating through the 
file.


If nova (or any other user of glance) iterates all the way through the 
file then the ChunkedFile code will hit the "finally" clause in 
__iter__() and close the file descriptor.


If nova starts iterating through the file and then stops (due to 
running out of room, for example), the ChunkedFile.__iter__() routine 
is left with an open file descriptor.  At this point deleting the 
image will not actually free up any space.


I'm not a glance guy so I could be wrong about the code.  The 
externally-visible data are:

1) glance-api is holding an open file descriptor to a deleted image file
2) If I kill glance-api the disk space is freed up.
3) If I modify nova to always finish iterating through the file the 
problem doesn't occur in the first place.


Gotcha, thanks for explaining. I think the problem is that there might
be a reference leak and therefore the FD is kept opened. Probably the
request interruption is not getting to the driver. I've filed this
bug[0] so we can look into it.

[0] https://bugs.launchpad.net/glance-store/+bug/1462235

Flavio

--
@flaper87
Flavio Percoco


pgpqbO1TwobzI.pgp
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-04 Thread Chris Friesen

On 06/04/2015 07:16 PM, Joshua Harlow wrote:

Perhaps someone needs to use (or forgot to use) contextlib.closing?

https://docs.python.org/2/library/contextlib.html#contextlib.closing

Or contextlib2 also may be useful:

http://contextlib2.readthedocs.org/en/latest/#contextlib2.ExitStack


The complication is that the error happens in nova, but it's glance-api that 
holds the file descriptor open.


So somehow the error has to be detected in nova, then the fact that nova is no 
longer interested in the file needs to be propagated through glanceclient via an 
HTTP call into glance-api so that it knows it can close the file descriptor. 
And I don't think that an API to do this exists currently.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-04 Thread Joshua Harlow

Perhaps someone needs to use (or forgot to use) contextlib.closing?

https://docs.python.org/2/library/contextlib.html#contextlib.closing

Or contextlib2 also may be useful:

http://contextlib2.readthedocs.org/en/latest/#contextlib2.ExitStack

-Josh

Chris Friesen wrote:

On 06/04/2015 03:01 AM, Flavio Percoco wrote:

On 03/06/15 16:46 -0600, Chris Friesen wrote:

We recently ran into an issue where nova couldn't write an image file
due to
lack of space and so just quit reading from glance.

This caused glance to be stuck with an open file descriptor, which
meant that
the image consumed space even after it was deleted.

I have a crude fix for nova at
"https://review.openstack.org/#/c/188179/";
which basically continues to read the image even though it can't
write it.
That seems less than ideal for large images though.

Is there a better way to do this? Is there a way for nova to indicate to
glance that it's no longer interested in that image and glance can
close the
file?

If I've followed this correctly, on the glance side I think the code in
question is ultimately
glance_store._drivers.filesystem.ChunkedFile.__iter__().


Actually, to be honest, I was quite confused by the email :P

Correct me if I still didn't understand what you're asking.

You ran out of space on the Nova side while downloading the image and
there's a file descriptor leak somewhere either in that lovely (sarcasm)
glance wrapper or in glanceclient.


The first part is correct, but the file descriptor is actually held by
glance-api.


Just by reading your email and glancing your patch, I believe the bug
might be in glanceclient but I'd need to five into this. The piece of
code you'll need to look into is[0].

glance_store is just used server side. If that's what you meant -
glance is keeping the request and the ChunkedFile around - then yes,
glance_store is the place to look into.

[0]
https://github.com/openstack/python-glanceclient/blob/master/glanceclient/v1/images.py#L152



I believe what's happening is that the ChunkedFile code opens the file
and creates the iterator. Nova then starts iterating through the file.

If nova (or any other user of glance) iterates all the way through the
file then the ChunkedFile code will hit the "finally" clause in
__iter__() and close the file descriptor.

If nova starts iterating through the file and then stops (due to running
out of room, for example), the ChunkedFile.__iter__() routine is left
with an open file descriptor. At this point deleting the image will not
actually free up any space.

I'm not a glance guy so I could be wrong about the code. The
externally-visible data are:
1) glance-api is holding an open file descriptor to a deleted image file
2) If I kill glance-api the disk space is freed up.
3) If I modify nova to always finish iterating through the file the
problem doesn't occur in the first place.

Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-04 Thread Chris Friesen

On 06/04/2015 03:01 AM, Flavio Percoco wrote:

On 03/06/15 16:46 -0600, Chris Friesen wrote:

We recently ran into an issue where nova couldn't write an image file due to
lack of space and so just quit reading from glance.

This caused glance to be stuck with an open file descriptor, which meant that
the image consumed space even after it was deleted.

I have a crude fix for nova at "https://review.openstack.org/#/c/188179/";
which basically continues to read the image even though it can't write it.
That seems less than ideal for large images though.

Is there a better way to do this?  Is there a way for nova to indicate to
glance that it's no longer interested in that image and glance can close the
file?

If I've followed this correctly, on the glance side I think the code in
question is ultimately glance_store._drivers.filesystem.ChunkedFile.__iter__().


Actually, to be honest, I was quite confused by the email :P

Correct me if I still didn't understand what you're asking.

You ran out of space on the Nova side while downloading the image and
there's a file descriptor leak somewhere either in that lovely (sarcasm)
glance wrapper or in glanceclient.


The first part is correct, but the file descriptor is actually held by 
glance-api.


Just by reading your email and glancing your patch, I believe the bug
might be in glanceclient but I'd need to five into this. The piece of
code you'll need to look into is[0].

glance_store is just used server side. If that's what you meant -
glance is keeping the request and the ChunkedFile around - then yes,
glance_store is the place to look into.

[0]
https://github.com/openstack/python-glanceclient/blob/master/glanceclient/v1/images.py#L152


I believe what's happening is that the ChunkedFile code opens the file and 
creates the iterator.  Nova then starts iterating through the file.


If nova (or any other user of glance) iterates all the way through the file then 
the ChunkedFile code will hit the "finally" clause in __iter__() and close the 
file descriptor.


If nova starts iterating through the file and then stops (due to running out of 
room, for example), the ChunkedFile.__iter__() routine is left with an open file 
descriptor.  At this point deleting the image will not actually free up any space.


I'm not a glance guy so I could be wrong about the code.  The externally-visible 
data are:

1) glance-api is holding an open file descriptor to a deleted image file
2) If I kill glance-api the disk space is freed up.
3) If I modify nova to always finish iterating through the file the problem 
doesn't occur in the first place.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [glance] How to deal with aborted image read?

2015-06-04 Thread Flavio Percoco

On 03/06/15 16:46 -0600, Chris Friesen wrote:
We recently ran into an issue where nova couldn't write an image file 
due to lack of space and so just quit reading from glance.


This caused glance to be stuck with an open file descriptor, which 
meant that the image consumed space even after it was deleted.


I have a crude fix for nova at 
"https://review.openstack.org/#/c/188179/"; which basically continues 
to read the image even though it can't write it.  That seems less than 
ideal for large images though.


Is there a better way to do this?  Is there a way for nova to indicate 
to glance that it's no longer interested in that image and glance can 
close the file?


If I've followed this correctly, on the glance side I think the code 
in question is ultimately 
glance_store._drivers.filesystem.ChunkedFile.__iter__().


Actually, to be honest, I was quite confused by the email :P

Correct me if I still didn't understand what you're asking.

You ran out of space on the Nova side while downloading the image and
there's a file descriptor leak somewhere either in that lovely (sarcasm)
glance wrapper or in glanceclient.

Just by reading your email and glancing your patch, I believe the bug
might be in glanceclient but I'd need to five into this. The piece of
code you'll need to look into is[0].

glance_store is just used server side. If that's what you meant -
glance is keeping the request and the ChunkedFile around - then yes,
glance_store is the place to look into.

[0] 
https://github.com/openstack/python-glanceclient/blob/master/glanceclient/v1/images.py#L152

Cheers,
Flavio

--
@flaper87
Flavio Percoco


pgp0DL1sr4LBf.pgp
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev