Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
A library that seems to have a pretty nice abstraction for this kind of thing/pattern, could be an idea to use it, or have something similar like it... https://boltons.readthedocs.org/en/latest/fileutils.html#atomic-file-saving -Josh On Sat, 26 Sep 2015 10:48:39 +0200 Julien Danjou wrote: > On Tue, Sep 22 2015, Chris Friesen wrote: > > > On 09/22/2015 05:48 PM, Joshua Harlow wrote: > >> A present: > >> > >> >>> import contextlib > >> >>> import os > >> >>> > >> >>> @contextlib.contextmanager > >> ... def synced_file(path, mode='wb'): > >> ... with open(path, mode) as fh: > >> ... yield fh > >> ... os.fdatasync(fh.fileno()) > >> ... > >> >>> with synced_file("/tmp/b.txt") as fh: > >> ...fh.write("b") > > > > Isn't that missing an "fh.flush()" somewhere before the fdatasync()? > > Unless proven otherwise, close() does a flush(). > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 09/26/2015 02:48 AM, Julien Danjou wrote: On Tue, Sep 22 2015, Chris Friesen wrote: On 09/22/2015 05:48 PM, Joshua Harlow wrote: A present: >>> import contextlib >>> import os >>> >>> @contextlib.contextmanager ... def synced_file(path, mode='wb'): ... with open(path, mode) as fh: ... yield fh ... os.fdatasync(fh.fileno()) ... >>> with synced_file("/tmp/b.txt") as fh: ...fh.write("b") Isn't that missing an "fh.flush()" somewhere before the fdatasync()? Unless proven otherwise, close() does a flush(). There's no close() before the fdatasync() in the above code. (And it wouldn't make sense anyway because you need the open fd to do the fdatasync().) Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On Tue, Sep 22 2015, Chris Friesen wrote: > On 09/22/2015 05:48 PM, Joshua Harlow wrote: >> A present: >> >> >>> import contextlib >> >>> import os >> >>> >> >>> @contextlib.contextmanager >> ... def synced_file(path, mode='wb'): >> ... with open(path, mode) as fh: >> ... yield fh >> ... os.fdatasync(fh.fileno()) >> ... >> >>> with synced_file("/tmp/b.txt") as fh: >> ...fh.write("b") > > Isn't that missing an "fh.flush()" somewhere before the fdatasync()? Unless proven otherwise, close() does a flush(). -- Julien Danjou ;; Free Software hacker ;; http://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
> -Original Message- > From: Chris Friesen [mailto:chris.frie...@windriver.com] > Sent: Friday, September 25, 2015 3:04 PM > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] [cinder] should we use fsync when writing iscsi > config file? > > On 09/25/2015 12:30 PM, Mitsuhiro Tanino wrote: > > On 09/22/2015 06:43 PM, Robert Collins wrote: > >> On 23 September 2015 at 09:52, Chris Friesen > >> wrote: > >>> Hi, > >>> > >>> I recently had an issue with one file out of a dozen or so in > >>> "/opt/cgcs/cinder/data/volumes/" being present but of size zero. > >>> I'm running stable/kilo if it makes a difference. > >>> > >>> Looking at the code in > >>> volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we > >>> should do a fsync() before the close(). The way it stands now, it > >>> seems like it might be possible to write the file, start making use > >>> of it, and then take a power outage before it actually gets written > >>> to persistent storage. When we come back up we could have an > >>> instance expecting to make use of it, but no target information in the on- > disk copy of the file. > > > > I think even if there is no target information in configuration file > > dir, c-vol started successfully and iSCSI targets were created automatically > and volumes were exported, right? > > > > There is an problem in this case that the iSCSI target was created > > without authentication because we can't get previous authentication from the > configuration file. > > > > I'm curious what kind of problem did you met? > > We had an issue in a private patch that was ported to Kilo without realizing > that the data type of chap_auth had changed. I understand. Thank you for your explanation. > > In my understanding, the provider_auth in database has user name and > > password > for iSCSI target. > > Therefore if we get authentication from DB, I think we can self-heal > > from this situation correctly after c-vol service is restarted. > > > > The lio target obtains authentication from provider_auth in database, > > but tgtd, iet, cxt obtain authentication from file to recreate iSCSI target > when c-vol is restarted. > > If the file is missing, these volumes are exported without > > authentication and configuration file is recreated as I mentioned above. > > > > tgtd: Get target chap auth from file > > iet: Get target chap auth from file > > cxt: Get target chap auth from file > > lio: Get target chap auth from Database(in provider_auth) > > scst: Get target chap auth by using original command > > > > If we get authentication from DB for tgtd, iet and cxt same as lio, we > > can recreate iSCSI target with proper authentication when c-vol is > > restarted. > > I think this is a solution for this situation. > > If we fixed the chap auth info then we could live with a zero-size file. > However, with the current code if we take a kernel panic or power outage it's > theoretically possible to end up with a corrupt file of nonzero size (due to > metadata hitting the persistant storage before the data). I'm not confident > that the current code would deal properly with that. > > That said, if we always regenerate every file from the DB on cinder-volume > startup (regardless of whether or not it existed, and without reading in the > existing file), then we'd be okay without the robustness improvements. This file is referred when the SCSI target service is restarted. Therefore, adding robustness for this file is also good approach. IMO. Thanks, Mitsuhiro Tanino > Chris > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi- > 2Dbin_mailman_listinfo_openstack-2Ddev&d=BQICAg&c=DZ- > EF4pZfxGSU6MfABwx0g&r=klD1krzABGW034E9oBtY1xmIn3g7xZAIxV0XxaZpkJE&m=SZPjS9uXH42q > hmzSRbSZ8x39C9xi3aBDw-SQ7xa8cTM&s=XWJ91NIJglFkBSr762rSq9TdWeiRSdS5Pl0LzS1_1Z8&e= __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
> -Original Message- > From: Eric Harney [mailto:ehar...@redhat.com] > Sent: Friday, September 25, 2015 2:56 PM > To: OpenStack Development Mailing List (not for usage questions) > Subject: Re: [openstack-dev] [cinder] should we use fsync when writing iscsi > config file? > > On 09/25/2015 02:30 PM, Mitsuhiro Tanino wrote: > > On 09/22/2015 06:43 PM, Robert Collins wrote: > >> On 23 September 2015 at 09:52, Chris Friesen > >> wrote: > >>> Hi, > >>> > >>> I recently had an issue with one file out of a dozen or so in > >>> "/opt/cgcs/cinder/data/volumes/" being present but of size zero. > >>> I'm running stable/kilo if it makes a difference. > >>> > >>> Looking at the code in > >>> volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we > >>> should do a fsync() before the close(). The way it stands now, it > >>> seems like it might be possible to write the file, start making use > >>> of it, and then take a power outage before it actually gets written > >>> to persistent storage. When we come back up we could have an > >>> instance expecting to make use of it, but no target information in the on- > disk copy of the file. > > > > I think even if there is no target information in configuration file > > dir, c-vol started successfully and iSCSI targets were created automatically > and volumes were exported, right? > > > > There is an problem in this case that the iSCSI target was created > > without authentication because we can't get previous authentication from the > configuration file. > > > > I'm curious what kind of problem did you met? > > > >> If its being kept in sync with DB records, and won't self-heal from > >> this situation, then yes. e.g. if the overall workflow is something > >> like > > > > In my understanding, the provider_auth in database has user name and > > password > for iSCSI target. > > Therefore if we get authentication from DB, I think we can self-heal > > from this situation correctly after c-vol service is restarted. > > > > Is this not already done as-needed by ensure_export()? Yes. This logic is in the ensure_export but only lio target uses DB and other targets use file. > > The lio target obtains authentication from provider_auth in database, > > but tgtd, iet, cxt obtain authentication from file to recreate iSCSI target > when c-vol is restarted. > > If the file is missing, these volumes are exported without > > authentication and configuration file is recreated as I mentioned above. > > > > tgtd: Get target chap auth from file > > iet: Get target chap auth from file > > cxt: Get target chap auth from file > > lio: Get target chap auth from Database(in provider_auth) > > scst: Get target chap auth by using original command > > > > If we get authentication from DB for tgtd, iet and cxt same as lio, we > > can recreate iSCSI target with proper authentication when c-vol is > > restarted. > > I think this is a solution for this situation. > > > > This may be possible, but fixing the target config file to be written more > safely to work as currently intended is still a win. I think it is better to fix both of them, (1) Add a logic to write configuration file using fsync (2) Read authentication from database during ensure_export() same as lio target. Thanks, Mitsuhiro Tanino > > Any thought? > > > > Thanks, > > Mitsuhiro Tanino > > > >> -Original Message- > >> From: Chris Friesen [mailto:chris.frie...@windriver.com] > >> Sent: Friday, September 25, 2015 12:48 PM > >> To: openstack-dev@lists.openstack.org > >> Subject: Re: [openstack-dev] [cinder] should we use fsync when > >> writing iscsi config file? > >> > >> On 09/24/2015 04:21 PM, Chris Friesen wrote: > >>> On 09/24/2015 12:18 PM, Chris Friesen wrote: > >>> > >>>> > >>>> I think what happened is that we took the SIGTERM after the open() > >>>> call in create_iscsi_target(), but before writing anything to the file. > >>>> > >>>> f = open(volume_path, 'w+') > >>>> f.write(volume_conf) > >>>> f.close() > >>>> > >>>> The 'w+' causes the file to be immediately truncated on opening, > >>>> leading to an empty file. > >>>> > >>>> To work around this, I think we need to
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 09/25/2015 12:30 PM, Mitsuhiro Tanino wrote: On 09/22/2015 06:43 PM, Robert Collins wrote: On 23 September 2015 at 09:52, Chris Friesen wrote: Hi, I recently had an issue with one file out of a dozen or so in "/opt/cgcs/cinder/data/volumes/" being present but of size zero. I'm running stable/kilo if it makes a difference. Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we should do a fsync() before the close(). The way it stands now, it seems like it might be possible to write the file, start making use of it, and then take a power outage before it actually gets written to persistent storage. When we come back up we could have an instance expecting to make use of it, but no target information in the on-disk copy of the file. I think even if there is no target information in configuration file dir, c-vol started successfully and iSCSI targets were created automatically and volumes were exported, right? There is an problem in this case that the iSCSI target was created without authentication because we can't get previous authentication from the configuration file. I'm curious what kind of problem did you met? We had an issue in a private patch that was ported to Kilo without realizing that the data type of chap_auth had changed. In my understanding, the provider_auth in database has user name and password for iSCSI target. Therefore if we get authentication from DB, I think we can self-heal from this situation correctly after c-vol service is restarted. The lio target obtains authentication from provider_auth in database, but tgtd, iet, cxt obtain authentication from file to recreate iSCSI target when c-vol is restarted. If the file is missing, these volumes are exported without authentication and configuration file is recreated as I mentioned above. tgtd: Get target chap auth from file iet: Get target chap auth from file cxt: Get target chap auth from file lio: Get target chap auth from Database(in provider_auth) scst: Get target chap auth by using original command If we get authentication from DB for tgtd, iet and cxt same as lio, we can recreate iSCSI target with proper authentication when c-vol is restarted. I think this is a solution for this situation. If we fixed the chap auth info then we could live with a zero-size file. However, with the current code if we take a kernel panic or power outage it's theoretically possible to end up with a corrupt file of nonzero size (due to metadata hitting the persistant storage before the data). I'm not confident that the current code would deal properly with that. That said, if we always regenerate every file from the DB on cinder-volume startup (regardless of whether or not it existed, and without reading in the existing file), then we'd be okay without the robustness improvements. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 09/25/2015 02:30 PM, Mitsuhiro Tanino wrote: > On 09/22/2015 06:43 PM, Robert Collins wrote: >> On 23 September 2015 at 09:52, Chris Friesen >> wrote: >>> Hi, >>> >>> I recently had an issue with one file out of a dozen or so in >>> "/opt/cgcs/cinder/data/volumes/" being present but of size zero. I'm >>> running stable/kilo if it makes a difference. >>> >>> Looking at the code in >>> volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we >>> should do a fsync() before the close(). The way it stands now, it >>> seems like it might be possible to write the file, start making use >>> of it, and then take a power outage before it actually gets written >>> to persistent storage. When we come back up we could have an >>> instance expecting to make use of it, but no target information in the >>> on-disk copy of the file. > > I think even if there is no target information in configuration file dir, > c-vol started successfully > and iSCSI targets were created automatically and volumes were exported, right? > > There is an problem in this case that the iSCSI target was created without > authentication because > we can't get previous authentication from the configuration file. > > I'm curious what kind of problem did you met? > >> If its being kept in sync with DB records, and won't self-heal from >> this situation, then yes. e.g. if the overall workflow is something >> like > > In my understanding, the provider_auth in database has user name and password > for iSCSI target. > Therefore if we get authentication from DB, I think we can self-heal from > this situation > correctly after c-vol service is restarted. > Is this not already done as-needed by ensure_export()? > The lio target obtains authentication from provider_auth in database, but > tgtd, iet, cxt obtain > authentication from file to recreate iSCSI target when c-vol is restarted. > If the file is missing, these volumes are exported without authentication and > configuration > file is recreated as I mentioned above. > > tgtd: Get target chap auth from file > iet: Get target chap auth from file > cxt: Get target chap auth from file > lio: Get target chap auth from Database(in provider_auth) > scst: Get target chap auth by using original command > > If we get authentication from DB for tgtd, iet and cxt same as lio, we can > recreate iSCSI target > with proper authentication when c-vol is restarted. > I think this is a solution for this situation. > This may be possible, but fixing the target config file to be written more safely to work as currently intended is still a win. > Any thought? > > Thanks, > Mitsuhiro Tanino > >> -Original Message- >> From: Chris Friesen [mailto:chris.frie...@windriver.com] >> Sent: Friday, September 25, 2015 12:48 PM >> To: openstack-dev@lists.openstack.org >> Subject: Re: [openstack-dev] [cinder] should we use fsync when writing iscsi >> config file? >> >> On 09/24/2015 04:21 PM, Chris Friesen wrote: >>> On 09/24/2015 12:18 PM, Chris Friesen wrote: >>> >>>> >>>> I think what happened is that we took the SIGTERM after the open() >>>> call in create_iscsi_target(), but before writing anything to the file. >>>> >>>> f = open(volume_path, 'w+') >>>> f.write(volume_conf) >>>> f.close() >>>> >>>> The 'w+' causes the file to be immediately truncated on opening, >>>> leading to an empty file. >>>> >>>> To work around this, I think we need to do the classic "write to a >>>> temporary file and then rename it to the desired filename" trick. >>>> The atomicity of the rename ensures that either the old contents or the new >> contents are present. >>> >>> I'm pretty sure that upstream code is still susceptible to zeroing out >>> the file in the above scenario. However, it doesn't take an >>> exception--that's due to a local change on our part that attempted to fix >>> the >> below issue. >>> >>> The stable/kilo code *does* have a problem in that when it regenerates >>> the file it's missing the CHAP authentication line (beginning with >> "incominguser"). >> >> I've proposed a change at https://urldefense.proofpoint.com/v2/url?u=https- >> 3A__review.openstack.org_-23_c_227943_&d=BQICAg&c=DZ- >> EF4pZfxGSU6MfABwx0g&r=klD1krzABGW034E9oBtY1xmIn3g7xZAIxV0XxaZpkJE&m=SVlOqKiqO04_ >> NttKUIoDiaOR7cePB0SOA1bpjakqAss&s=q2_8XBAVH9lQ2mdT72nW7dN2EafIqJEpHGLBuf4K970&e= >> >> If anyone has suggestions on how to do this more robustly or more cleanly, >> please let me know. >> >> Chris >> __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 09/22/2015 06:43 PM, Robert Collins wrote: > On 23 September 2015 at 09:52, Chris Friesen > wrote: >> Hi, >> >> I recently had an issue with one file out of a dozen or so in >> "/opt/cgcs/cinder/data/volumes/" being present but of size zero. I'm >> running stable/kilo if it makes a difference. >> >> Looking at the code in >> volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we >> should do a fsync() before the close(). The way it stands now, it >> seems like it might be possible to write the file, start making use >> of it, and then take a power outage before it actually gets written >> to persistent storage. When we come back up we could have an >> instance expecting to make use of it, but no target information in the >> on-disk copy of the file. I think even if there is no target information in configuration file dir, c-vol started successfully and iSCSI targets were created automatically and volumes were exported, right? There is an problem in this case that the iSCSI target was created without authentication because we can't get previous authentication from the configuration file. I'm curious what kind of problem did you met? > If its being kept in sync with DB records, and won't self-heal from > this situation, then yes. e.g. if the overall workflow is something > like In my understanding, the provider_auth in database has user name and password for iSCSI target. Therefore if we get authentication from DB, I think we can self-heal from this situation correctly after c-vol service is restarted. The lio target obtains authentication from provider_auth in database, but tgtd, iet, cxt obtain authentication from file to recreate iSCSI target when c-vol is restarted. If the file is missing, these volumes are exported without authentication and configuration file is recreated as I mentioned above. tgtd: Get target chap auth from file iet: Get target chap auth from file cxt: Get target chap auth from file lio: Get target chap auth from Database(in provider_auth) scst: Get target chap auth by using original command If we get authentication from DB for tgtd, iet and cxt same as lio, we can recreate iSCSI target with proper authentication when c-vol is restarted. I think this is a solution for this situation. Any thought? Thanks, Mitsuhiro Tanino > -Original Message- > From: Chris Friesen [mailto:chris.frie...@windriver.com] > Sent: Friday, September 25, 2015 12:48 PM > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] [cinder] should we use fsync when writing iscsi > config file? > > On 09/24/2015 04:21 PM, Chris Friesen wrote: > > On 09/24/2015 12:18 PM, Chris Friesen wrote: > > > >> > >> I think what happened is that we took the SIGTERM after the open() > >> call in create_iscsi_target(), but before writing anything to the file. > >> > >> f = open(volume_path, 'w+') > >> f.write(volume_conf) > >> f.close() > >> > >> The 'w+' causes the file to be immediately truncated on opening, > >> leading to an empty file. > >> > >> To work around this, I think we need to do the classic "write to a > >> temporary file and then rename it to the desired filename" trick. > >> The atomicity of the rename ensures that either the old contents or the new > contents are present. > > > > I'm pretty sure that upstream code is still susceptible to zeroing out > > the file in the above scenario. However, it doesn't take an > > exception--that's due to a local change on our part that attempted to fix > > the > below issue. > > > > The stable/kilo code *does* have a problem in that when it regenerates > > the file it's missing the CHAP authentication line (beginning with > "incominguser"). > > I've proposed a change at https://urldefense.proofpoint.com/v2/url?u=https- > 3A__review.openstack.org_-23_c_227943_&d=BQICAg&c=DZ- > EF4pZfxGSU6MfABwx0g&r=klD1krzABGW034E9oBtY1xmIn3g7xZAIxV0XxaZpkJE&m=SVlOqKiqO04_ > NttKUIoDiaOR7cePB0SOA1bpjakqAss&s=q2_8XBAVH9lQ2mdT72nW7dN2EafIqJEpHGLBuf4K970&e= > > If anyone has suggestions on how to do this more robustly or more cleanly, > please let me know. > > Chris > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi- > 2Dbin_mailman_listinfo_openstack-2Ddev&d=BQICAg&c=DZ- > EF4pZfxGSU6MfABwx0g&r=klD1krzABGW034E9oBtY1xmIn3g7xZAIxV0XxaZpkJE&m=SVlOqKiqO04_ > NttKUIoDiaOR7cePB0SOA1bpjakqAss&s=0DBbmeXSIK2c5QlBnwURY1iwNN1AXuqOLaUYnxjBl0w&e= __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 09/24/2015 04:21 PM, Chris Friesen wrote: On 09/24/2015 12:18 PM, Chris Friesen wrote: I think what happened is that we took the SIGTERM after the open() call in create_iscsi_target(), but before writing anything to the file. f = open(volume_path, 'w+') f.write(volume_conf) f.close() The 'w+' causes the file to be immediately truncated on opening, leading to an empty file. To work around this, I think we need to do the classic "write to a temporary file and then rename it to the desired filename" trick. The atomicity of the rename ensures that either the old contents or the new contents are present. I'm pretty sure that upstream code is still susceptible to zeroing out the file in the above scenario. However, it doesn't take an exception--that's due to a local change on our part that attempted to fix the below issue. The stable/kilo code *does* have a problem in that when it regenerates the file it's missing the CHAP authentication line (beginning with "incominguser"). I've proposed a change at https://review.openstack.org/#/c/227943/ If anyone has suggestions on how to do this more robustly or more cleanly, please let me know. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 09/24/2015 12:18 PM, Chris Friesen wrote: I think what happened is that we took the SIGTERM after the open() call in create_iscsi_target(), but before writing anything to the file. f = open(volume_path, 'w+') f.write(volume_conf) f.close() The 'w+' causes the file to be immediately truncated on opening, leading to an empty file. To work around this, I think we need to do the classic "write to a temporary file and then rename it to the desired filename" trick. The atomicity of the rename ensures that either the old contents or the new contents are present. I'm pretty sure that upstream code is still susceptible to zeroing out the file in the above scenario. However, it doesn't take an exception--that's due to a local change on our part that attempted to fix the below issue. The stable/kilo code *does* have a problem in that when it regenerates the file it's missing the CHAP authentication line (beginning with "incominguser"). Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 09/24/2015 10:54 AM, Chris Friesen wrote: I took another look at the code and realized that the file *should* get rebuilt on restart after a power outage--if the file already exists it will print a warning message in the logs but it should still overwrite the contents of the file with the desired contents. However, that didn't happen in my case. That made me confused about how I ever ended up with an empty persistence file. I went back to my logs and found this: File "./usr/lib64/python2.7/site-packages/cinder/volume/manager.py", line 334, in init_host File "/usr/lib64/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper File "./usr/lib64/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 603, in ensure_export File "./usr/lib64/python2.7/site-packages/cinder/volume/targets/iscsi.py", line 296, in ensure_export File "./usr/lib64/python2.7/site-packages/cinder/volume/targets/tgt.py", line 185, in create_iscsi_target TypeError: not enough arguments for format string So it seems like we might have a bug in the handling of an empty file. And I think I know how we got the empty file in the first place, and it wasn't the original file creation but rather the file re-creation. I have logs from shortly before the above logs showing cinder-volume receiving a SIGTERM while it was processing the volume in question: 2015-09-21 19:23:59.123 12429 WARNING cinder.volume.targets.tgt [req-7d092503-198a-4f59-97e9-d4d520d38379 - - - - -] Persistence file already exists for volume, found file at: /opt/cgcs/cinder/data/volumes/volume-76c5f285-a15e-474e-b59e-fd609a624090 2015-09-21 19:24:01.252 12429 WARNING cinder.volume.targets.tgt [req-7d092503-198a-4f59-97e9-d4d520d38379 - - - - -] Persistence file already exists for volume, found file at: /opt/cgcs/cinder/data/volumes/volume-993c94b2-e256-4baf-ab55-805a8e28f547 2015-09-21 19:24:01.951 8201 INFO cinder.openstack.common.service [req-904f88a8-8e6f-425e-8df7-5cbb9baae0c5 - - - - -] Caught SIGTERM, stopping children I think what happened is that we took the SIGTERM after the open() call in create_iscsi_target(), but before writing anything to the file. f = open(volume_path, 'w+') f.write(volume_conf) f.close() The 'w+' causes the file to be immediately truncated on opening, leading to an empty file. To work around this, I think we need to do the classic "write to a temporary file and then rename it to the desired filename" trick. The atomicity of the rename ensures that either the old contents or the new contents are present. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 09/22/2015 06:19 PM, John Griffith wrote: On Tue, Sep 22, 2015 at 6:17 PM, John Griffith That target file pretty much "is" the persistence record, the db entry is the iqn and provider info only. I think that adding the fdatasync isn't a bad idea at all. At the very least it doesn't hurt. Power losses on attach I would expect to be problematic regardless. Let me clarify the statement above, if you loose power to the node in the middle of an attach process and the file wasn't written properly you're most likely 'stuck' and will have to detach (which deletes the file) or it will be in an error state and rebuild the file when you try the attach again anyway IIRC, it's been a while since we've mucked with that code (thank goodness)!! I took another look at the code and realized that the file *should* get rebuilt on restart after a power outage--if the file already exists it will print a warning message in the logs but it should still overwrite the contents of the file with the desired contents. However, that didn't happen in my case. That made me confused about how I ever ended up with an empty persistence file. I went back to my logs and found this: File "./usr/lib64/python2.7/site-packages/cinder/volume/manager.py", line 334, in init_host File "/usr/lib64/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper File "./usr/lib64/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 603, in ensure_export File "./usr/lib64/python2.7/site-packages/cinder/volume/targets/iscsi.py", line 296, in ensure_export File "./usr/lib64/python2.7/site-packages/cinder/volume/targets/tgt.py", line 185, in create_iscsi_target TypeError: not enough arguments for format string So it seems like we might have a bug in the handling of an empty file. In kilo/stable, line 185 code looks like this: chap_str = 'incominguser %s %s' % chap_auth This means that "chap_auth" must not be None coming into this function. In volume.targets.iscsi.ISCSITarget.ensure_export() we call chap_auth = self._get_target_chap_auth(context, iscsi_name) which I think would return None given the empty file. I tried to reproduce with devstack running stable/kilo by booting from volume and then removing the persistance file and restarting cinder-volume. It recreated the persistance file, but the "incominguser" line was missing completely from the regenerated file. I think I need to try to reproduce this in our load with some extra debugging, see if I can figure out exactly what's going on. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
Chris Friesen wrote: On 09/22/2015 05:48 PM, Joshua Harlow wrote: A present: >>> import contextlib >>> import os >>> >>> @contextlib.contextmanager ... def synced_file(path, mode='wb'): ... with open(path, mode) as fh: ... yield fh ... os.fdatasync(fh.fileno()) ... >>> with synced_file("/tmp/b.txt") as fh: ... fh.write("b") Isn't that missing an "fh.flush()" somewhere before the fdatasync()? I was testing you, obviously, lol, congrats you passed ;) Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On Tue, Sep 22, 2015 at 6:17 PM, John Griffith wrote: > > > On Tue, Sep 22, 2015 at 5:48 PM, Joshua Harlow > wrote: > >> A present: >> >> >>> import contextlib >> >>> import os >> >>> >> >>> @contextlib.contextmanager >> ... def synced_file(path, mode='wb'): >> ... with open(path, mode) as fh: >> ... yield fh >> ... os.fdatasync(fh.fileno()) >> ... >> >>> with synced_file("/tmp/b.txt") as fh: >> ...fh.write("b") >> ... >> >> Have fun :-P >> >> -Josh >> >> >> Robert Collins wrote: >> >>> On 23 September 2015 at 09:52, Chris Friesen >>> wrote: >>> Hi, I recently had an issue with one file out of a dozen or so in "/opt/cgcs/cinder/data/volumes/" being present but of size zero. I'm running stable/kilo if it makes a difference. Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we should do a fsync() before the close(). The way it stands now, it seems like it might be possible to write the file, start making use of it, and then take a power outage before it actually gets written to persistent storage. When we come back up we could have an instance expecting to make use of it, but no target information in the on-disk copy of the file. >>> >>> If its being kept in sync with DB records, and won't self-heal from >>> this situation, then yes. e.g. if the overall workflow is something >>> like >>> >>> receive RPC request >>> do some work, including writing the file >>> reply to the RPC with 'ok' >>> -> gets written to the DB and the state recorded as ACTIVE or whatever.. >>> >>> then yes, we need to behave as though its active even if the machine >>> is power cycled robustly. >>> >>> Even then, fsync() explicitly says that it doesn't ensure that the directory changes have reached disk...for that another explicit fsync() on the file descriptor for the directory is needed. So I think for robustness we'd want to add f.flush() os.fsync(f.fileno()) >>> >>> fdatasync would be better - we don't care about the metadata. >>> >>> between the existing f.write() and f.close(), and then add something like: f = open(volumes_dir, 'w+') os.fsync(f.fileno()) f.close() >>> >>> Yup, but again - fdatasync here I believe. >>> >>> -Rob >>> >>> >>> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > That target file pretty much "is" the persistence record, the db entry is > the iqn and provider info only. I think that adding the fdatasync isn't a > bad idea at all. At the very least it doesn't hurt. Power losses on > attach I would expect to be problematic regardless. > Let me clarify the statement above, if you loose power to the node in the middle of an attach process and the file wasn't written properly you're most likely 'stuck' and will have to detach (which deletes the file) or it will be in an error state and rebuild the file when you try the attach again anyway IIRC, it's been a while since we've mucked with that code (thank goodness)!! > > Keep in mind that file is built as part of the attach process stemming > from initialize-connection. > > John > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On Tue, Sep 22, 2015 at 5:48 PM, Joshua Harlow wrote: > A present: > > >>> import contextlib > >>> import os > >>> > >>> @contextlib.contextmanager > ... def synced_file(path, mode='wb'): > ... with open(path, mode) as fh: > ... yield fh > ... os.fdatasync(fh.fileno()) > ... > >>> with synced_file("/tmp/b.txt") as fh: > ...fh.write("b") > ... > > Have fun :-P > > -Josh > > > Robert Collins wrote: > >> On 23 September 2015 at 09:52, Chris Friesen >> wrote: >> >>> Hi, >>> >>> I recently had an issue with one file out of a dozen or so in >>> "/opt/cgcs/cinder/data/volumes/" being present but of size zero. I'm >>> running stable/kilo if it makes a difference. >>> >>> Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(), >>> I'm >>> wondering if we should do a fsync() before the close(). The way it >>> stands >>> now, it seems like it might be possible to write the file, start making >>> use >>> of it, and then take a power outage before it actually gets written to >>> persistent storage. When we come back up we could have an instance >>> expecting to make use of it, but no target information in the on-disk >>> copy >>> of the file. >>> >> >> If its being kept in sync with DB records, and won't self-heal from >> this situation, then yes. e.g. if the overall workflow is something >> like >> >> receive RPC request >> do some work, including writing the file >> reply to the RPC with 'ok' >> -> gets written to the DB and the state recorded as ACTIVE or whatever.. >> >> then yes, we need to behave as though its active even if the machine >> is power cycled robustly. >> >> Even then, fsync() explicitly says that it doesn't ensure that the >>> directory >>> changes have reached disk...for that another explicit fsync() on the file >>> descriptor for the directory is needed. >>> So I think for robustness we'd want to add >>> >>> f.flush() >>> os.fsync(f.fileno()) >>> >> >> fdatasync would be better - we don't care about the metadata. >> >> between the existing f.write() and f.close(), and then add something like: >>> >>> f = open(volumes_dir, 'w+') >>> os.fsync(f.fileno()) >>> f.close() >>> >> >> Yup, but again - fdatasync here I believe. >> >> -Rob >> >> >> > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > That target file pretty much "is" the persistence record, the db entry is the iqn and provider info only. I think that adding the fdatasync isn't a bad idea at all. At the very least it doesn't hurt. Power losses on attach I would expect to be problematic regardless. Keep in mind that file is built as part of the attach process stemming from initialize-connection. John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 09/22/2015 05:48 PM, Joshua Harlow wrote: A present: >>> import contextlib >>> import os >>> >>> @contextlib.contextmanager ... def synced_file(path, mode='wb'): ... with open(path, mode) as fh: ... yield fh ... os.fdatasync(fh.fileno()) ... >>> with synced_file("/tmp/b.txt") as fh: ...fh.write("b") Isn't that missing an "fh.flush()" somewhere before the fdatasync()? Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
A present: >>> import contextlib >>> import os >>> >>> @contextlib.contextmanager ... def synced_file(path, mode='wb'): ... with open(path, mode) as fh: ... yield fh ... os.fdatasync(fh.fileno()) ... >>> with synced_file("/tmp/b.txt") as fh: ...fh.write("b") ... Have fun :-P -Josh Robert Collins wrote: On 23 September 2015 at 09:52, Chris Friesen wrote: Hi, I recently had an issue with one file out of a dozen or so in "/opt/cgcs/cinder/data/volumes/" being present but of size zero. I'm running stable/kilo if it makes a difference. Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we should do a fsync() before the close(). The way it stands now, it seems like it might be possible to write the file, start making use of it, and then take a power outage before it actually gets written to persistent storage. When we come back up we could have an instance expecting to make use of it, but no target information in the on-disk copy of the file. If its being kept in sync with DB records, and won't self-heal from this situation, then yes. e.g. if the overall workflow is something like receive RPC request do some work, including writing the file reply to the RPC with 'ok' -> gets written to the DB and the state recorded as ACTIVE or whatever.. then yes, we need to behave as though its active even if the machine is power cycled robustly. Even then, fsync() explicitly says that it doesn't ensure that the directory changes have reached disk...for that another explicit fsync() on the file descriptor for the directory is needed. So I think for robustness we'd want to add f.flush() os.fsync(f.fileno()) fdatasync would be better - we don't care about the metadata. between the existing f.write() and f.close(), and then add something like: f = open(volumes_dir, 'w+') os.fsync(f.fileno()) f.close() Yup, but again - fdatasync here I believe. -Rob __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] should we use fsync when writing iscsi config file?
On 23 September 2015 at 09:52, Chris Friesen wrote: > Hi, > > I recently had an issue with one file out of a dozen or so in > "/opt/cgcs/cinder/data/volumes/" being present but of size zero. I'm > running stable/kilo if it makes a difference. > > Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm > wondering if we should do a fsync() before the close(). The way it stands > now, it seems like it might be possible to write the file, start making use > of it, and then take a power outage before it actually gets written to > persistent storage. When we come back up we could have an instance > expecting to make use of it, but no target information in the on-disk copy > of the file. If its being kept in sync with DB records, and won't self-heal from this situation, then yes. e.g. if the overall workflow is something like receive RPC request do some work, including writing the file reply to the RPC with 'ok' -> gets written to the DB and the state recorded as ACTIVE or whatever.. then yes, we need to behave as though its active even if the machine is power cycled robustly. > Even then, fsync() explicitly says that it doesn't ensure that the directory > changes have reached disk...for that another explicit fsync() on the file > descriptor for the directory is needed. > So I think for robustness we'd want to add > > f.flush() > os.fsync(f.fileno()) fdatasync would be better - we don't care about the metadata. > between the existing f.write() and f.close(), and then add something like: > > f = open(volumes_dir, 'w+') > os.fsync(f.fileno()) > f.close() Yup, but again - fdatasync here I believe. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [cinder] should we use fsync when writing iscsi config file?
Hi, I recently had an issue with one file out of a dozen or so in "/opt/cgcs/cinder/data/volumes/" being present but of size zero. I'm running stable/kilo if it makes a difference. Looking at the code in volume.targets.tgt.TgtAdm.create_iscsi_target(), I'm wondering if we should do a fsync() before the close(). The way it stands now, it seems like it might be possible to write the file, start making use of it, and then take a power outage before it actually gets written to persistent storage. When we come back up we could have an instance expecting to make use of it, but no target information in the on-disk copy of the file. Even then, fsync() explicitly says that it doesn't ensure that the directory changes have reached disk...for that another explicit fsync() on the file descriptor for the directory is needed. So I think for robustness we'd want to add f.flush() os.fsync(f.fileno()) between the existing f.write() and f.close(), and then add something like: f = open(volumes_dir, 'w+') os.fsync(f.fileno()) f.close() Thoughts? Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev