[gdal-dev] Open(), OpenShared(), errors, FlushCache(), and no Close() ?

2011-03-18 Thread Michal Migurski
Hi,

I'm seeing some weird behaviors related to virtual raster datasets opened 
simultaneously from multiple processes. I hope I can explain so that this makes 
sense. Here's an excerpt of my python code:

http://dpaste.com/hold/515217/

Line 8 is where I make a change to the dataset:

source_ds.SetProjection(source_ds.GetGCPProjection())

I do that so that the projection for the ground control points is available for 
a later call to gdal.ReprojectImage(); it wasn't working until I started to use 
SetProjection() in this way. All of this is being called from the context of a 
multi-process web server, running as unprivileged user "www-data" under Ubuntu 
(this is important later). My web server error log fills up with these:

ERROR 1: Failed to write .vrt file in FlushCache().

My assumption here is that because the unprivileged user can't write to the 
dataset file, gdal throws off an error to complain that it can't flush the 
dataset cache back to the original file. So far, this is just an annoyance, but 
one that I would expect to go away when I switched from gdal.Open() to 
gdal.OpenShared() with the read-only flag, like this:

gdal.OpenShared(src_path, gdal.GA_ReadOnly)

Still getting the errors.

Meanwhile, I made a switch in web servers, from an Apache-based CGI environment 
to the multi-worker WSGI server Gunicorn. When I initially ran my code under 
Gunicorn using my normal, privileged user account, I immediately started to see 
failures from gdal.Open and gdal.OpenShared, specifically the assertion errors 
on line 4 of the dpaste above. I tried to place exclusive file locks (using 
fcntl.flock) around each access to a given VRT dataset, but this didn't seem to 
help at all. There were frequent, unpredictable errors with opening data sets 
in a multi-process environment *until* I switched from the privileged user to 
the unprivileged user. Once I did that, everything began to work normally, but 
I got all the old "ERROR 1" reports again.

It seems to me that gdal.OpenShared() with the read-only flag isn't doing what 
it promises, and that it's trying to write back to the files, potentially 
modifying them even as competing processes are accessing them. Is it possible 
that the overlapping processes in my privileged user scenario are seeing 
temporarily-empty VRT files? I'm also confused by the lack of a gdal.Close() 
function or something similar, and by the fact that I can't seem to make a 
change to a dataset in memory without gdal attempting to push that change back 
to disk via FlushCache().

What's the right thing to do here? Make temporary copies of small VRT data sets 
prior to each use so they can be safely written to and disposed of? Build a 
wrapper class that encapsulates copying and disposal? Figure out some way to 
make gdal release datasets when asked, or open them in real read-only mode?

Any advice greatly appreciated!

-mike.


michal migurski- m...@stamen.com
 415.558.1610



___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Open(), OpenShared(), errors, FlushCache(), and no Close() ?

2011-03-18 Thread Even Rouault
Michal,

For a reason I'm unclear (might be just historical and not desired behaviour 
?), the VRT driver will try to rewrite the VRT if it has been modified.

There's however a workaround to avoid the error to pop at the closing. You can 
empty the description of the dataset with source_ds.SetDescription('')

Open() or OpenShared() will not change anything about that.

In python, you close a dataset by dropping the reference to the object, for 
example by assigning None to it.

I'm not clear why you have errors with your new webserver, but if you use a 
multi-threaded one, did you make sure you have built GDAL with thread support 
(./configure --with-threads)  ? (This is now the default since GDAL 1.8.0)

Best regards,

Even

> Hi,
> 
> I'm seeing some weird behaviors related to virtual raster datasets opened
> simultaneously from multiple processes. I hope I can explain so that this
> makes sense. Here's an excerpt of my python code:
> 
>   http://dpaste.com/hold/515217/
> 
> Line 8 is where I make a change to the dataset:
> 
>   source_ds.SetProjection(source_ds.GetGCPProjection())
> 
> I do that so that the projection for the ground control points is available
> for a later call to gdal.ReprojectImage(); it wasn't working until I
> started to use SetProjection() in this way. All of this is being called
> from the context of a multi-process web server, running as unprivileged
> user "www-data" under Ubuntu (this is important later). My web server
> error log fills up with these:
> 
>   ERROR 1: Failed to write .vrt file in FlushCache().
> 
> My assumption here is that because the unprivileged user can't write to the
> dataset file, gdal throws off an error to complain that it can't flush the
> dataset cache back to the original file. So far, this is just an
> annoyance, but one that I would expect to go away when I switched from
> gdal.Open() to gdal.OpenShared() with the read-only flag, like this:
> 
>   gdal.OpenShared(src_path, gdal.GA_ReadOnly)
> 
> Still getting the errors.
> 
> Meanwhile, I made a switch in web servers, from an Apache-based CGI
> environment to the multi-worker WSGI server Gunicorn. When I initially ran
> my code under Gunicorn using my normal, privileged user account, I
> immediately started to see failures from gdal.Open and gdal.OpenShared,
> specifically the assertion errors on line 4 of the dpaste above. I tried
> to place exclusive file locks (using fcntl.flock) around each access to a
> given VRT dataset, but this didn't seem to help at all. There were
> frequent, unpredictable errors with opening data sets in a multi-process
> environment *until* I switched from the privileged user to the
> unprivileged user. Once I did that, everything began to work normally, but
> I got all the old "ERROR 1" reports again.
> 
> It seems to me that gdal.OpenShared() with the read-only flag isn't doing
> what it promises, and that it's trying to write back to the files,
> potentially modifying them even as competing processes are accessing them.
> Is it possible that the overlapping processes in my privileged user
> scenario are seeing temporarily-empty VRT files? I'm also confused by the
> lack of a gdal.Close() function or something similar, and by the fact that
> I can't seem to make a change to a dataset in memory without gdal
> attempting to push that change back to disk via FlushCache().
> 
> What's the right thing to do here? Make temporary copies of small VRT data
> sets prior to each use so they can be safely written to and disposed of?
> Build a wrapper class that encapsulates copying and disposal? Figure out
> some way to make gdal release datasets when asked, or open them in real
> read-only mode?
> 
> Any advice greatly appreciated!
> 
> -mike.
> 
> 
> michal migurski- m...@stamen.com
>  415.558.1610
> 
> 
> 
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Open(), OpenShared(), errors, FlushCache(), and no Close() ?

2011-03-18 Thread Michal Migurski
Thanks Even, very helpful!

Gunicorn is not multi-thread, but it's multi-process, so there's going to be 
concurrent connections to a data set even though I'm not performing any 
threaded functions. I'll try what you suggest, dropping the object reference to 
see what happens.

-mike.

On Mar 18, 2011, at 3:14 PM, Even Rouault wrote:

> Michal,
> 
> For a reason I'm unclear (might be just historical and not desired behaviour 
> ?), the VRT driver will try to rewrite the VRT if it has been modified.
> 
> There's however a workaround to avoid the error to pop at the closing. You 
> can 
> empty the description of the dataset with source_ds.SetDescription('')
> 
> Open() or OpenShared() will not change anything about that.
> 
> In python, you close a dataset by dropping the reference to the object, for 
> example by assigning None to it.
> 
> I'm not clear why you have errors with your new webserver, but if you use a 
> multi-threaded one, did you make sure you have built GDAL with thread support 
> (./configure --with-threads)  ? (This is now the default since GDAL 1.8.0)
> 
> Best regards,
> 
> Even
> 
>> Hi,
>> 
>> I'm seeing some weird behaviors related to virtual raster datasets opened
>> simultaneously from multiple processes. I hope I can explain so that this
>> makes sense. Here's an excerpt of my python code:
>> 
>>  http://dpaste.com/hold/515217/
>> 
>> Line 8 is where I make a change to the dataset:
>> 
>>  source_ds.SetProjection(source_ds.GetGCPProjection())
>> 
>> I do that so that the projection for the ground control points is available
>> for a later call to gdal.ReprojectImage(); it wasn't working until I
>> started to use SetProjection() in this way. All of this is being called
>> from the context of a multi-process web server, running as unprivileged
>> user "www-data" under Ubuntu (this is important later). My web server
>> error log fills up with these:
>> 
>>  ERROR 1: Failed to write .vrt file in FlushCache().
>> 
>> My assumption here is that because the unprivileged user can't write to the
>> dataset file, gdal throws off an error to complain that it can't flush the
>> dataset cache back to the original file. So far, this is just an
>> annoyance, but one that I would expect to go away when I switched from
>> gdal.Open() to gdal.OpenShared() with the read-only flag, like this:
>> 
>>  gdal.OpenShared(src_path, gdal.GA_ReadOnly)
>> 
>> Still getting the errors.
>> 
>> Meanwhile, I made a switch in web servers, from an Apache-based CGI
>> environment to the multi-worker WSGI server Gunicorn. When I initially ran
>> my code under Gunicorn using my normal, privileged user account, I
>> immediately started to see failures from gdal.Open and gdal.OpenShared,
>> specifically the assertion errors on line 4 of the dpaste above. I tried
>> to place exclusive file locks (using fcntl.flock) around each access to a
>> given VRT dataset, but this didn't seem to help at all. There were
>> frequent, unpredictable errors with opening data sets in a multi-process
>> environment *until* I switched from the privileged user to the
>> unprivileged user. Once I did that, everything began to work normally, but
>> I got all the old "ERROR 1" reports again.
>> 
>> It seems to me that gdal.OpenShared() with the read-only flag isn't doing
>> what it promises, and that it's trying to write back to the files,
>> potentially modifying them even as competing processes are accessing them.
>> Is it possible that the overlapping processes in my privileged user
>> scenario are seeing temporarily-empty VRT files? I'm also confused by the
>> lack of a gdal.Close() function or something similar, and by the fact that
>> I can't seem to make a change to a dataset in memory without gdal
>> attempting to push that change back to disk via FlushCache().
>> 
>> What's the right thing to do here? Make temporary copies of small VRT data
>> sets prior to each use so they can be safely written to and disposed of?
>> Build a wrapper class that encapsulates copying and disposal? Figure out
>> some way to make gdal release datasets when asked, or open them in real
>> read-only mode?
>> 
>> Any advice greatly appreciated!
>> 
>> -mike.
>> 
>> 
>> michal migurski- m...@stamen.com
>> 415.558.1610
>> 
>> 
>> 
>> ___
>> gdal-dev mailing list
>> gdal-dev@lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> 


michal migurski- m...@stamen.com
 415.558.1610



___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Open(), OpenShared(), errors, FlushCache(), and no Close() ?

2011-03-18 Thread Michal Migurski
A thing I ended up writing to deal with gdal's desire to write to the VRT files:

http://dpaste.com/hold/516167/

-mike.

On Mar 18, 2011, at 3:59 PM, Michal Migurski wrote:

> Thanks Even, very helpful!
> 
> Gunicorn is not multi-thread, but it's multi-process, so there's going to be 
> concurrent connections to a data set even though I'm not performing any 
> threaded functions. I'll try what you suggest, dropping the object reference 
> to see what happens.
> 
> -mike.
> 
> On Mar 18, 2011, at 3:14 PM, Even Rouault wrote:
> 
>> Michal,
>> 
>> For a reason I'm unclear (might be just historical and not desired behaviour 
>> ?), the VRT driver will try to rewrite the VRT if it has been modified.
>> 
>> There's however a workaround to avoid the error to pop at the closing. You 
>> can 
>> empty the description of the dataset with source_ds.SetDescription('')
>> 
>> Open() or OpenShared() will not change anything about that.
>> 
>> In python, you close a dataset by dropping the reference to the object, for 
>> example by assigning None to it.
>> 
>> I'm not clear why you have errors with your new webserver, but if you use a 
>> multi-threaded one, did you make sure you have built GDAL with thread 
>> support 
>> (./configure --with-threads)  ? (This is now the default since GDAL 1.8.0)
>> 
>> Best regards,
>> 
>> Even
>> 
>>> Hi,
>>> 
>>> I'm seeing some weird behaviors related to virtual raster datasets opened
>>> simultaneously from multiple processes. I hope I can explain so that this
>>> makes sense. Here's an excerpt of my python code:
>>> 
>>> http://dpaste.com/hold/515217/
>>> 
>>> Line 8 is where I make a change to the dataset:
>>> 
>>> source_ds.SetProjection(source_ds.GetGCPProjection())
>>> 
>>> I do that so that the projection for the ground control points is available
>>> for a later call to gdal.ReprojectImage(); it wasn't working until I
>>> started to use SetProjection() in this way. All of this is being called
>>> from the context of a multi-process web server, running as unprivileged
>>> user "www-data" under Ubuntu (this is important later). My web server
>>> error log fills up with these:
>>> 
>>> ERROR 1: Failed to write .vrt file in FlushCache().
>>> 
>>> My assumption here is that because the unprivileged user can't write to the
>>> dataset file, gdal throws off an error to complain that it can't flush the
>>> dataset cache back to the original file. So far, this is just an
>>> annoyance, but one that I would expect to go away when I switched from
>>> gdal.Open() to gdal.OpenShared() with the read-only flag, like this:
>>> 
>>> gdal.OpenShared(src_path, gdal.GA_ReadOnly)
>>> 
>>> Still getting the errors.
>>> 
>>> Meanwhile, I made a switch in web servers, from an Apache-based CGI
>>> environment to the multi-worker WSGI server Gunicorn. When I initially ran
>>> my code under Gunicorn using my normal, privileged user account, I
>>> immediately started to see failures from gdal.Open and gdal.OpenShared,
>>> specifically the assertion errors on line 4 of the dpaste above. I tried
>>> to place exclusive file locks (using fcntl.flock) around each access to a
>>> given VRT dataset, but this didn't seem to help at all. There were
>>> frequent, unpredictable errors with opening data sets in a multi-process
>>> environment *until* I switched from the privileged user to the
>>> unprivileged user. Once I did that, everything began to work normally, but
>>> I got all the old "ERROR 1" reports again.
>>> 
>>> It seems to me that gdal.OpenShared() with the read-only flag isn't doing
>>> what it promises, and that it's trying to write back to the files,
>>> potentially modifying them even as competing processes are accessing them.
>>> Is it possible that the overlapping processes in my privileged user
>>> scenario are seeing temporarily-empty VRT files? I'm also confused by the
>>> lack of a gdal.Close() function or something similar, and by the fact that
>>> I can't seem to make a change to a dataset in memory without gdal
>>> attempting to push that change back to disk via FlushCache().
>>> 
>>> What's the right thing to do here? Make temporary copies of small VRT data
>>> sets prior to each use so they can be safely written to and disposed of?
>>> Build a wrapper class that encapsulates copying and disposal? Figure out
>>> some way to make gdal release datasets when asked, or open them in real
>>> read-only mode?
>>> 
>>> Any advice greatly appreciated!
>>> 
>>> -mike.
>>> 
>>> 
>>> michal migurski- m...@stamen.com
>>>415.558.1610
>>> 
>>> 
>>> 
>>> ___
>>> gdal-dev mailing list
>>> gdal-dev@lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>> 
> 
> 
> michal migurski- m...@stamen.com
> 415.558.1610
> 
> 
> 
>