Re: [gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

2013-12-18 Thread Trent Piepho
Do you see page file activity?  If you look at /proc/pid/smaps, you
should be able to see the actual status of the mapping of your data
file.  Probably it is consuming a large number of pages of RAM, but
also there should be zero pages written to swap.  All clean private or
clean shared, zero anonymous and zero swap.

I think the system unresponsiveness is probably do to I/O scheduling.
You're process has queued a lot of I/O reads and everything else has
to wait in the queue.  So all other I/O sees huge latencies.

And too, a 20 GB mapping is probably thrashing the TLB.  Do huge pages
actually get used?  On the embedded systems I'm more intimately
familiar with, only normal 4k pages are used by user processes.  Huge
TLBs are more of a special case that can be used by the kernel for
things like frame buffer mappings and SoC register windows.


On Wed, Dec 18, 2013 at 2:02 PM, Even Rouault
 wrote:
> Le mercredi 18 décembre 2013 21:09:48, Trent Piepho a écrit :
>> On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault
>>
>>  wrote:
>> > Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
>> >> I imagined an available virtual method on the band which could be
>> >> implemented - primarily by the RawBand class to try and mmap() the data
>> >> and return the layout.  But when that fails, or is unavailable it could
>> >> use your existing methodology with a layout that seems well tuned to
>> >> the underlying data organization.
>> >
>> > Yes, that should be doable, but with the limitation I raised about the
>> > memory management of file-based mmap() : if you mmap() a file larger
>> > than RAM, and read it entirely, without explicit madvise() to discard
>> > regions no longer needed, it will fill RAM and cause disk swapping. I
>> > should retest to confirm. Perhaps there are some OS level tuning to
>> > avoid that ?
>>
>> For Linux, if you mmap a file and do not write to it, the pages will
>> be clean.  This means that under memory pressure those pages can be
>> dropped without paging out to swap.  They are already backed on disk
>> in the mmaped file.  Only dirty anonymous mapped pages (anon mmap,
>> malloc() memory from mmap() or brk(), stack, etc.) would need to be
>> written to swap.
>
> Yes, that's the theory. But in practice, on my system ( kernel 2.6.32-46-
> generic 64 bit - Ubuntu 10.04 - 4 GB RAM ), the system becomes rather
> unresponsive as soon as the process has read a part of the file that is
> equivalent to the initial remaining free RAM. The 'top' utility shows it to
> consume ~ 2.7 GB, which must be the free RAM.
>
> Here's the test program I've used :
>
> test_mmap.c :
>
> #define _LARGEFILE64_SOURCE 1
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int main(int argc, char* argv[])
> {
> int fd;
> struct stat64 buf;
> char* ptr;
> long long i;
> int res = 0;
> int bDontNeed = 0;
>
> assert( argc == 2 || argc == 3 );
> if( argc == 3 && strcmp(argv[2], "-dontneed") == 0 )
> bDontNeed = 1;
> fd = open(argv[1], O_RDONLY);
> assert(fd >= 0);
> assert(stat64(argv[1], &buf) == 0);
> ptr = (char*) mmap(NULL, buf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> assert(ptr);
> for(i = 0; i< buf.st_size; i+= 4096)
> {
> /* Discard the pages every 500 MB read */
> if( bDontNeed && ((i % (1024 * 1024 * 500)) == 0) )
> madvise(ptr, buf.st_size, MADV_DONTNEED);
>
> res += ptr[i];
> }
> close(fd);
> return res;
> }
>
> $ gcc -Wall -g test_mmap.c -o test_mmap
>
> $ ./test_mmap eudem_dem_4258_europe.tif
> (the file is 20 GB large)
>
> --> system becomes unresponsive
>
> $ ./test_mmap eudem_dem_4258_europe.tif -dontneed
>
> --> system remains usable. Every 500 MB read, a madvise() call will
> explicitely discard all pages. That's just for test. It couldn't be used in
> practice.
>
> ==> Does anyone reproduce similar behaviour ?
>
>>
>> Of course if you touch a large amount of memory and know you're never
>> use it again, you can help the OS out when it comes to deciding which
>> pages to free by using madvise.
>>
>> One think to consider is that a 32-bit OS can only memory map about
>> 2-3 GB at once, even though there is no trouble using files much
>> larger than this size.  If you want to access a large file with
>> mmap(), you might need to use some kind of sliding window.
>
> Yes, I'm well aware of that. But 32bit systems are now becoming increasingly
> legacy, so we shouldn't worry too much about them.
>
>>
>> I think also, mmaping many gigabytes has a certain cost in setting up
>> the page tables for the mapping that's not insignificant.  Even on a
>> 64-bit os, mmaping a 20 GB file just to access some small portion of
>> it could be inefficient.
>
> Yes, I agree there are hidden costs in the memory management layers of the OS.
> "Huge TLB pages" (2 MB) on AMD64 systems can potentially be a solution to
> decrease that cost. I had started a bit

Re: [gdal-dev] ISO WKB

2013-12-18 Thread Even Rouault
Le jeudi 19 décembre 2013 00:50:06, Paul Ramsey a écrit :
> I've updated my working branch to match your intent more closely, I hope
> 
> https://github.com/pramsey/gdal/tree/isowkb
> 
> the iso enumeration is no longer there, and access to iso geometry
> types is via a protected method only.

Yes, looks good (except some generated files that accidently were committed). 

> 
> The reason I asked about GDAL2 is that some of the stuff in OGR seemed
> new to me (multiple geometry columns, e.g.) and I thought that those
> kinds of changes might be leading to a GDAL2 release.

There are a lot of possible ideas for a GDAL 2 ( 
http://trac.osgeo.org/gdal/wiki/GDAL20Changes ).
Multiple geometry columns is indeed a recent addition ( 
http://trac.osgeo.org/gdal/wiki/rfc41_multiple_geometry_fields ), but it 
doesn't break the C API. 
To tag a GDAL 2, we would probably need something more disruptive (although 
hopefully not too disruptive !) since GDAL 2 will sound to people's hears : 
"ah, maybe I have to adapt my code that has worked for the last past 10 years"
Not sure when this will happen...

> 
> P.
> 
> On Wed, Dec 18, 2013 at 1:20 AM, Even Rouault
> 
>  wrote:
> > Le mercredi 18 décembre 2013 06:28:16, Paul Ramsey a écrit :
> >> I don't think we should expose the ISO geometry types to the world,
> >> they're just for WKB really, so I'll keep that part hidden away. It's
> >> a shame we can't get rid of the 25d type variants for gdal2... if not
> >> then, when?
> > 
> > Ah, I didn't perceive you wanted to go that far. Well, that's certainly
> > something that could be done for a GDAL 2. It would require a RFC to draw
> > the battle plan and analyze the impacts.
> > 
> >> Incidentally, is there going to be a GDAL 1.11?
> > 
> > Technically, at that point, no breaking changes have been done in trunk,
> > so 1.11 would make sense as a version number.
> > 
> > Even
> > 
> >> P.
> >> 
> >> On Tue, Dec 17, 2013 at 1:50 PM, Even Rouault
> >> 
> >>  wrote:
> >> > Le mardi 17 décembre 2013 22:38:26, Paul Ramsey a écrit :
> >> >> OK, so hide the ISO types from the outside world. No problem.
> >> >> 
> >> >> Is it OK to have getGeometryType and exportToWkb accept wkbVariant
> >> >> optional parameters?
> >> > 
> >> > For exportToWkb(), it is just a matter of taste whether to add an
> >> > optional parameter or to have a dedicated method.
> >> > 
> >> > For getGeometryType(), as it returns a OGRwkbGeometryType, you can't
> >> > add an optional parameter to return values other than
> >> > OGRwkbGeometryType. My latest proposal was to have a - protected -
> >> > "int
> >> > getGeometryType(wkbVariant)  { return
> >> > (eVariant == wkbVariantOgc) ? getGeometryType()  :
> >> > getIsoGeometryType(); }" and a public OGRwkbIsoGeometryType
> >> > getIsoGeometryType().
> >> > 
> >> >> P.
> >> >> 
> >> >> On Tue, Dec 17, 2013 at 1:03 AM, Even Rouault
> >> >> 
> >> >>  wrote:
> >> >> > Selon Paul Ramsey :
> >> >> >> Back to this, is it OK?
> >> >> > 
> >> >> > As said in
> >> >> > http://lists.osgeo.org/pipermail/gdal-dev/2013-December/037738.html
> >> >> > , I feel a bit unconfortable with the extension of the
> >> >> > OGRwkbGeometryType enumeration that has possible impacts on other
> >> >> > parts of OGR. There's perhaps a time where we will touch it, but
> >> >> > I'd expect it to ideally embrace Z, M, ZM, circular geometries at
> >> >> > once. And that would deserve a RFC.
> >> >> > 
> >> >> > What do you think of keeping it an internal enumeration of OGR,
> >> >> > since that's probably all you need for now ?
> >> >> > 
> >> >> > "Or have a separate OGRwkbIsoGeometryType enumeration {
> >> >> > wkbPointIso, ... wkbGeometryCollectionIso, wkbPointIsoZ, ...
> >> >> > wkbGeometryCollectionIsoZ }, a getIsoGeometryType() method that
> >> >> > returns it, and the exportToWkb() methods that calls int
> >> >> > getGeometryType(OGRwkbVariant eVariant) { return (eVariant ==
> >> >> > wkbVariantOgc) ? getGeometryType()  : getIsoGeometryType(); }"
> >> >> > 
> >> >> > I'd be happy to hear about other GDAL developers opinion on this.
> >> >> > 
> >> >> >> How are we patching back to SVN? I can convert
> >> >> >> it into a patch and attach to a ticket, if that's the path.
> >> >> > 
> >> >> > git-svn can be used to bridge the 2 worlds, but in my recent
> >> >> > experience it has been painful to use. So generating a patch and
> >> >> > applying it is probably easier.
> >> >> > 
> >> >> > Even
> >> >> > 
> >> >> > --
> >> >> > Geospatial professional services
> >> >> > http://even.rouault.free.fr/services.html
> >> > 
> >> > --
> >> > Geospatial professional services
> >> > http://even.rouault.free.fr/services.html
> > 
> > --
> > Geospatial professional services
> > http://even.rouault.free.fr/services.html

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] ISO WKB

2013-12-18 Thread Paul Ramsey
I've updated my working branch to match your intent more closely, I hope

https://github.com/pramsey/gdal/tree/isowkb

the iso enumeration is no longer there, and access to iso geometry
types is via a protected method only.

The reason I asked about GDAL2 is that some of the stuff in OGR seemed
new to me (multiple geometry columns, e.g.) and I thought that those
kinds of changes might be leading to a GDAL2 release.

P.

On Wed, Dec 18, 2013 at 1:20 AM, Even Rouault
 wrote:
> Le mercredi 18 décembre 2013 06:28:16, Paul Ramsey a écrit :
>> I don't think we should expose the ISO geometry types to the world,
>> they're just for WKB really, so I'll keep that part hidden away. It's
>> a shame we can't get rid of the 25d type variants for gdal2... if not
>> then, when?
>
> Ah, I didn't perceive you wanted to go that far. Well, that's certainly
> something that could be done for a GDAL 2. It would require a RFC to draw the
> battle plan and analyze the impacts.
>
>>
>> Incidentally, is there going to be a GDAL 1.11?
>
> Technically, at that point, no breaking changes have been done in trunk, so
> 1.11 would make sense as a version number.
>
> Even
>
>>
>> P.
>>
>> On Tue, Dec 17, 2013 at 1:50 PM, Even Rouault
>>
>>  wrote:
>> > Le mardi 17 décembre 2013 22:38:26, Paul Ramsey a écrit :
>> >> OK, so hide the ISO types from the outside world. No problem.
>> >>
>> >> Is it OK to have getGeometryType and exportToWkb accept wkbVariant
>> >> optional parameters?
>> >
>> > For exportToWkb(), it is just a matter of taste whether to add an
>> > optional parameter or to have a dedicated method.
>> >
>> > For getGeometryType(), as it returns a OGRwkbGeometryType, you can't add
>> > an optional parameter to return values other than OGRwkbGeometryType. My
>> > latest proposal was to have a - protected - "int
>> > getGeometryType(wkbVariant)  { return
>> > (eVariant == wkbVariantOgc) ? getGeometryType()  :
>> > getIsoGeometryType(); }" and a public OGRwkbIsoGeometryType
>> > getIsoGeometryType().
>> >
>> >> P.
>> >>
>> >> On Tue, Dec 17, 2013 at 1:03 AM, Even Rouault
>> >>
>> >>  wrote:
>> >> > Selon Paul Ramsey :
>> >> >> Back to this, is it OK?
>> >> >
>> >> > As said in
>> >> > http://lists.osgeo.org/pipermail/gdal-dev/2013-December/037738.html, I
>> >> > feel a bit unconfortable with the extension of the OGRwkbGeometryType
>> >> > enumeration that has possible impacts on other parts of OGR. There's
>> >> > perhaps a time where we will touch it, but I'd expect it to ideally
>> >> > embrace Z, M, ZM, circular geometries at once. And that would deserve
>> >> > a RFC.
>> >> >
>> >> > What do you think of keeping it an internal enumeration of OGR, since
>> >> > that's probably all you need for now ?
>> >> >
>> >> > "Or have a separate OGRwkbIsoGeometryType enumeration { wkbPointIso,
>> >> > ... wkbGeometryCollectionIso, wkbPointIsoZ, ...
>> >> > wkbGeometryCollectionIsoZ }, a getIsoGeometryType() method that
>> >> > returns it, and the exportToWkb() methods that calls int
>> >> > getGeometryType(OGRwkbVariant eVariant) { return (eVariant ==
>> >> > wkbVariantOgc) ? getGeometryType()  : getIsoGeometryType(); }"
>> >> >
>> >> > I'd be happy to hear about other GDAL developers opinion on this.
>> >> >
>> >> >> How are we patching back to SVN? I can convert
>> >> >> it into a patch and attach to a ticket, if that's the path.
>> >> >
>> >> > git-svn can be used to bridge the 2 worlds, but in my recent
>> >> > experience it has been painful to use. So generating a patch and
>> >> > applying it is probably easier.
>> >> >
>> >> > Even
>> >> >
>> >> > --
>> >> > Geospatial professional services
>> >> > http://even.rouault.free.fr/services.html
>> >
>> > --
>> > Geospatial professional services
>> > http://even.rouault.free.fr/services.html
>
> --
> Geospatial professional services
> http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

2013-12-18 Thread Even Rouault
Le mercredi 18 décembre 2013 21:09:48, Trent Piepho a écrit :
> On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault
> 
>  wrote:
> > Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
> >> I imagined an available virtual method on the band which could be
> >> implemented - primarily by the RawBand class to try and mmap() the data
> >> and return the layout.  But when that fails, or is unavailable it could
> >> use your existing methodology with a layout that seems well tuned to
> >> the underlying data organization.
> > 
> > Yes, that should be doable, but with the limitation I raised about the
> > memory management of file-based mmap() : if you mmap() a file larger
> > than RAM, and read it entirely, without explicit madvise() to discard
> > regions no longer needed, it will fill RAM and cause disk swapping. I
> > should retest to confirm. Perhaps there are some OS level tuning to
> > avoid that ?
> 
> For Linux, if you mmap a file and do not write to it, the pages will
> be clean.  This means that under memory pressure those pages can be
> dropped without paging out to swap.  They are already backed on disk
> in the mmaped file.  Only dirty anonymous mapped pages (anon mmap,
> malloc() memory from mmap() or brk(), stack, etc.) would need to be
> written to swap.

Yes, that's the theory. But in practice, on my system ( kernel 2.6.32-46-
generic 64 bit - Ubuntu 10.04 - 4 GB RAM ), the system becomes rather 
unresponsive as soon as the process has read a part of the file that is 
equivalent to the initial remaining free RAM. The 'top' utility shows it to 
consume ~ 2.7 GB, which must be the free RAM.

Here's the test program I've used :

test_mmap.c :

#define _LARGEFILE64_SOURCE 1
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char* argv[])
{
int fd;
struct stat64 buf;
char* ptr;
long long i;
int res = 0;
int bDontNeed = 0;

assert( argc == 2 || argc == 3 );
if( argc == 3 && strcmp(argv[2], "-dontneed") == 0 )
bDontNeed = 1;
fd = open(argv[1], O_RDONLY);
assert(fd >= 0);
assert(stat64(argv[1], &buf) == 0);
ptr = (char*) mmap(NULL, buf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
assert(ptr);
for(i = 0; i< buf.st_size; i+= 4096)
{
/* Discard the pages every 500 MB read */
if( bDontNeed && ((i % (1024 * 1024 * 500)) == 0) )
madvise(ptr, buf.st_size, MADV_DONTNEED);

res += ptr[i];
}
close(fd);
return res;
}

$ gcc -Wall -g test_mmap.c -o test_mmap

$ ./test_mmap eudem_dem_4258_europe.tif
(the file is 20 GB large)

--> system becomes unresponsive

$ ./test_mmap eudem_dem_4258_europe.tif -dontneed

--> system remains usable. Every 500 MB read, a madvise() call will 
explicitely discard all pages. That's just for test. It couldn't be used in 
practice.

==> Does anyone reproduce similar behaviour ?

> 
> Of course if you touch a large amount of memory and know you're never
> use it again, you can help the OS out when it comes to deciding which
> pages to free by using madvise.
> 
> One think to consider is that a 32-bit OS can only memory map about
> 2-3 GB at once, even though there is no trouble using files much
> larger than this size.  If you want to access a large file with
> mmap(), you might need to use some kind of sliding window.

Yes, I'm well aware of that. But 32bit systems are now becoming increasingly 
legacy, so we shouldn't worry too much about them.

> 
> I think also, mmaping many gigabytes has a certain cost in setting up
> the page tables for the mapping that's not insignificant.  Even on a
> 64-bit os, mmaping a 20 GB file just to access some small portion of
> it could be inefficient.

Yes, I agree there are hidden costs in the memory management layers of the OS. 
"Huge TLB pages" (2 MB) on AMD64 systems can potentially be a solution to 
decrease that cost. I had started a bit to experiment with that, but my kernel 
was not recent enough to benefit from all functionnalities or it didn't seem 
really practical to use.

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

2013-12-18 Thread Trent Piepho
On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault
 wrote:
> Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
>>
>> I imagined an available virtual method on the band which could be
>> implemented - primarily by the RawBand class to try and mmap() the data and
>> return the layout.  But when that fails, or is unavailable it could use
>> your existing methodology with a layout that seems well tuned to the
>> underlying data organization.
>
> Yes, that should be doable, but with the limitation I raised about the memory
> management of file-based mmap() : if you mmap() a file larger than RAM, and 
> read
> it entirely, without explicit madvise() to discard regions no longer needed,
> it will fill RAM and cause disk swapping. I should retest to confirm. Perhaps
> there are some OS level tuning to avoid that ?

For Linux, if you mmap a file and do not write to it, the pages will
be clean.  This means that under memory pressure those pages can be
dropped without paging out to swap.  They are already backed on disk
in the mmaped file.  Only dirty anonymous mapped pages (anon mmap,
malloc() memory from mmap() or brk(), stack, etc.) would need to be
written to swap.

Of course if you touch a large amount of memory and know you're never
use it again, you can help the OS out when it comes to deciding which
pages to free by using madvise.

One think to consider is that a 32-bit OS can only memory map about
2-3 GB at once, even though there is no trouble using files much
larger than this size.  If you want to access a large file with
mmap(), you might need to use some kind of sliding window.

I think also, mmaping many gigabytes has a certain cost in setting up
the page tables for the mapping that's not insignificant.  Even on a
64-bit os, mmaping a 20 GB file just to access some small portion of
it could be inefficient.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

2013-12-18 Thread Frank Warmerdam
On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault  wrote:

> > I'm wondering if there would be
> > ways of making what you propose work with Python Numpy in such a way
> that a
> > numpy array could be requested which is of this virtual memory.  That
> would
> > also be a nice extension.
>
> Hum, how would that be different from what is proposed in the SWIG bindings
> section of the RFC ?
>
>
Even,

Ahem - I apparently did not read the RFC closely enough.  You are well
ahead of me on this idea.

Best regards,

-- 
---+--
I set the clouds in motion - turn up   | Frank Warmerdam,
warmer...@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Software Developer
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

2013-12-18 Thread Frank Warmerdam
On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault  wrote:

> Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
> > Even,
> >
> > Sorry, I was thinking of mmap() directly to the file, and having
> something
> > like:
> >
> > CPLVirtualMem CPL_DLL* GDALBandGetVirtualMemAuto( GDALRasterBandH hBand,
> >  int *pnPixelSpace,
> >  GIntBig *pnLineSpace,
> >  char **papszOptions );
> >
> > I imagined an available virtual method on the band which could be
> > implemented - primarily by the RawBand class to try and mmap() the data
> and
> > return the layout.  But when that fails, or is unavailable it could use
> > your existing methodology with a layout that seems well tuned to the
> > underlying data organization.
>
> Yes, that should be doable, but with the limitation I raised about the
> memory
> management of file-based mmap() : if you mmap() a file larger than RAM,
> and read
> it entirely, without explicit madvise() to discard regions no longer
> needed,
> it will fill RAM and cause disk swapping. I should retest to confirm.
> Perhaps

there are some OS level tuning to avoid that ?
>

Even,

That was not my experience for readonly mmap() of actual files on disk
"back in the day".

In any event, I'd suggest sticking with what you have, and if I'm keen
perhaps one day I'll try and implement mmap() support.  If I do, I feel
like it needs to go down through the VSI*L system and once a file is
mmapped() the VSI*L IO should also be using the mmaped images.  Once upon a
time this had performance benefits. I'm not sure if that is the case any
more.

Best regards,
Frank

-- 
---+--
I set the clouds in motion - turn up   | Frank Warmerdam,
warmer...@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Software Developer
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

2013-12-18 Thread Even Rouault
Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
> Even,
> 
> Sorry, I was thinking of mmap() directly to the file, and having something
> like:
> 
> CPLVirtualMem CPL_DLL* GDALBandGetVirtualMemAuto( GDALRasterBandH hBand,
>  int *pnPixelSpace,
>  GIntBig *pnLineSpace,
>  char **papszOptions );
> 
> I imagined an available virtual method on the band which could be
> implemented - primarily by the RawBand class to try and mmap() the data and
> return the layout.  But when that fails, or is unavailable it could use
> your existing methodology with a layout that seems well tuned to the
> underlying data organization.

Yes, that should be doable, but with the limitation I raised about the memory 
management of file-based mmap() : if you mmap() a file larger than RAM, and 
read 
it entirely, without explicit madvise() to discard regions no longer needed, 
it will fill RAM and cause disk swapping. I should retest to confirm. Perhaps 
there are some OS level tuning to avoid that ?

> 
> Certainly there is no need to hold things up for this.  What you are
> proposing is already wonderfully useful. 

I've no particular timetable for this. This started as an experiment. So I'm 
happy to explore complementary ideas.

> I'm wondering if there would be
> ways of making what you propose work with Python Numpy in such a way that a
> numpy array could be requested which is of this virtual memory.  That would
> also be a nice extension.

Hum, how would that be different from what is proposed in the SWIG bindings 
section of the RFC ?

> 
> Best regards,
> Frank
> 
> 
> 
> On Wed, Dec 18, 2013 at 2:10 AM, Even Rouault
> 
> wrote:
> > Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
> > > Even,
> > > 
> > > Very impressive work, I am supportive.
> > > 
> > > IMHO it would be wonderful if there was also an mmap() based mechanism
> > > where you could ask for the virtual memory chunk and you get it back
> > > (if
> > 
> > it
> > 
> > > works) along with stride values to access in it.  This could likely be
> > 
> > made
> > 
> > > to work for most "raw" based formats and a few others too.  It might
> > > also allow non-mmap() based files to return an organization based more
> > > on
> > 
> > their
> > 
> > > actual organization for efficiency.
> > 
> > Hi Frank,
> > 
> > I'm not completely sure to have understood your idea. Would that be
> > something
> > like :
> > 
> > CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
> > 
> >  GDALRWFlag eRWFlag,
> >  int nXOff, int nYOff,
> >  int nXSize, int nYSize,
> >  int nBufXSize, int nBufYSize,
> >  GDALDataType eBufType,
> >  int nBandCount, int* panBandMap,
> >  int *pnPixelSpace,
> >  GIntBig *pnLineSpace,
> >  GIntBig *pnBandSpace,
> >  size_t nCacheSize,
> >  int bSingleThreadUsage,
> >  char **papszOptions );
> > 
> > Difference with GDALDatasetGetVirtualMem() : the stride values are now
> > output
> > values and no more nPageSizeHint parameter.
> > 
> > In your mind, would the spacings be determined in a generic way from the
> > dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item),
> > or would that require some direct cooperation of the driver ?
> > 
> > Since you mention raw formats, perhaps you are thinking more to a
> > file-based
> > mmap() rather than a anonymous mmap() combined with RasterIO(), like
> > currently
> > proposed ? This is something I've mentionned in the "Related thoughts"
> > paragraph but there are practical annoyance with how Linux manages memory
> > with
> > file-based mmap(). I'd be happy if someone has successfull experience
> > with that
> > by the way (and that doesn't require explicit madvise() each time you're
> > done
> > with a range of memory)
> > 
> > ---
> > 
> > Reading again your words, I'm now wondering if you are not thinking to a
> > Dataset / RasterBand virtual method that could be implemented by drivers
> > ?
> > 
> > virtual CPLVirtualMem* GetVirtualMem(...)
> > 
> > They would directly use the low-level CPLVirtualMem to create the mapping
> > and
> > provide their own callback to fill pages when page fault occurs. So they
> > could
> > potentially avoid using the block cache layer and do direct file I/O ?
> > 
> > Looking at RawRasterBand::IRasterIO(), I can see that it can use (under
> > some
> > circumstances with a non obvious heuristics) dire

Re: [gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

2013-12-18 Thread Frank Warmerdam
Even,

Sorry, I was thinking of mmap() directly to the file, and having something
like:

CPLVirtualMem CPL_DLL* GDALBandGetVirtualMemAuto( GDALRasterBandH hBand,
 int *pnPixelSpace,
 GIntBig *pnLineSpace,
 char **papszOptions );

I imagined an available virtual method on the band which could be
implemented - primarily by the RawBand class to try and mmap() the data and
return the layout.  But when that fails, or is unavailable it could use
your existing methodology with a layout that seems well tuned to the
underlying data organization.

Certainly there is no need to hold things up for this.  What you are
proposing is already wonderfully useful.  I'm wondering if there would be
ways of making what you propose work with Python Numpy in such a way that a
numpy array could be requested which is of this virtual memory.  That would
also be a nice extension.

Best regards,
Frank



On Wed, Dec 18, 2013 at 2:10 AM, Even Rouault
wrote:

> Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
> > Even,
> >
> > Very impressive work, I am supportive.
> >
> > IMHO it would be wonderful if there was also an mmap() based mechanism
> > where you could ask for the virtual memory chunk and you get it back (if
> it
> > works) along with stride values to access in it.  This could likely be
> made
> > to work for most "raw" based formats and a few others too.  It might also
> > allow non-mmap() based files to return an organization based more on
> their
> > actual organization for efficiency.
>
> Hi Frank,
>
> I'm not completely sure to have understood your idea. Would that be
> something
> like :
>
> CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
>  GDALRWFlag eRWFlag,
>  int nXOff, int nYOff,
>  int nXSize, int nYSize,
>  int nBufXSize, int nBufYSize,
>  GDALDataType eBufType,
>  int nBandCount, int* panBandMap,
>  int *pnPixelSpace,
>  GIntBig *pnLineSpace,
>  GIntBig *pnBandSpace,
>  size_t nCacheSize,
>  int bSingleThreadUsage,
>  char **papszOptions );
>
> Difference with GDALDatasetGetVirtualMem() : the stride values are now
> output
> values and no more nPageSizeHint parameter.
>
> In your mind, would the spacings be determined in a generic way from the
> dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item), or
> would that require some direct cooperation of the driver ?
>
> Since you mention raw formats, perhaps you are thinking more to a
> file-based
> mmap() rather than a anonymous mmap() combined with RasterIO(), like
> currently
> proposed ? This is something I've mentionned in the "Related thoughts"
> paragraph but there are practical annoyance with how Linux manages memory
> with
> file-based mmap(). I'd be happy if someone has successfull experience with
> that
> by the way (and that doesn't require explicit madvise() each time you're
> done
> with a range of memory)
>
> ---
>
> Reading again your words, I'm now wondering if you are not thinking to a
> Dataset / RasterBand virtual method that could be implemented by drivers ?
>
> virtual CPLVirtualMem* GetVirtualMem(...)
>
> They would directly use the low-level CPLVirtualMem to create the mapping
> and
> provide their own callback to fill pages when page fault occurs. So they
> could
> potentially avoid using the block cache layer and do direct file I/O ?
>
> Looking at RawRasterBand::IRasterIO(), I can see that it can use (under
> some
> circumstances with a non obvious heuristics) direct file I/O without going
> to
> the block cache. So the current proposed implementation could potentially
> already benefit from that. Perhaps we would need a flag to RasterIO to ask
> it to
> avoid block cache when possible. Or just call
> CPLSetThreadLocalConfigOption("GDAL_ONE_BIG_READ", "YES") in
> GDALVirtualMem::DoIOBandSequential() / DoIOPixelInterleaved()
>
> Even
>
> --
> Geospatial professional services
> http://even.rouault.free.fr/services.html
>



-- 
---+--
I set the clouds in motion - turn up   | Frank Warmerdam,
warmer...@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Software Developer
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Hourly Consultant to help with python script

2013-12-18 Thread Chaitanya kumar CH
Hi Dennis,

I'd like to take up your offer. This looks like a small enough job. I can
use the gdal_polygonize algorithm to process all the areas with a value
other than nodata value.

Please email me the details and sample data if you would like me to work on
it.

--
Best regards,
Chaitanya Kumar CH
On Dec 18, 2013 9:07 PM, "Dennis Burgess"  wrote:

> I have a python script that I need to convert existing geotiffs to poly
> KML files.   The existing files are multi-colored, and all I want is any
> colored area, vs the background color.
>
>
>
> If you are interested, we can pay via paypal, send me your hourly rate and
> when you are free via e-mail!   We will remit payment once work is
> completed.
>
>
>
>
>
> *Dennis Burgess, *
>
>
>
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

[gdal-dev] Hourly Consultant to help with python script

2013-12-18 Thread Dennis Burgess
I have a python script that I need to convert existing geotiffs to poly
KML files.   The existing files are multi-colored, and all I want is any
colored area, vs the background color.  

 

If you are interested, we can pay via paypal, send me your hourly rate
and when you are free via e-mail!   We will remit payment once work is
completed.

 

 

Dennis Burgess, 

 

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

2013-12-18 Thread Even Rouault
Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
> Even,
> 
> Very impressive work, I am supportive.
> 
> IMHO it would be wonderful if there was also an mmap() based mechanism
> where you could ask for the virtual memory chunk and you get it back (if it
> works) along with stride values to access in it.  This could likely be made
> to work for most "raw" based formats and a few others too.  It might also
> allow non-mmap() based files to return an organization based more on their
> actual organization for efficiency.

Hi Frank,

I'm not completely sure to have understood your idea. Would that be something 
like :

CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
 GDALRWFlag eRWFlag,
 int nXOff, int nYOff,
 int nXSize, int nYSize,
 int nBufXSize, int nBufYSize,
 GDALDataType eBufType,
 int nBandCount, int* panBandMap,
 int *pnPixelSpace,
 GIntBig *pnLineSpace,
 GIntBig *pnBandSpace,
 size_t nCacheSize,
 int bSingleThreadUsage,
 char **papszOptions );

Difference with GDALDatasetGetVirtualMem() : the stride values are now output 
values and no more nPageSizeHint parameter.

In your mind, would the spacings be determined in a generic way from the 
dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item), or 
would that require some direct cooperation of the driver ?

Since you mention raw formats, perhaps you are thinking more to a file-based 
mmap() rather than a anonymous mmap() combined with RasterIO(), like currently 
proposed ? This is something I've mentionned in the "Related thoughts" 
paragraph but there are practical annoyance with how Linux manages memory with 
file-based mmap(). I'd be happy if someone has successfull experience with that 
by the way (and that doesn't require explicit madvise() each time you're done 
with a range of memory)

---

Reading again your words, I'm now wondering if you are not thinking to a 
Dataset / RasterBand virtual method that could be implemented by drivers ?

virtual CPLVirtualMem* GetVirtualMem(...)

They would directly use the low-level CPLVirtualMem to create the mapping and 
provide their own callback to fill pages when page fault occurs. So they could 
potentially avoid using the block cache layer and do direct file I/O ?

Looking at RawRasterBand::IRasterIO(), I can see that it can use (under some 
circumstances with a non obvious heuristics) direct file I/O without going to 
the block cache. So the current proposed implementation could potentially 
already benefit from that. Perhaps we would need a flag to RasterIO to ask it 
to 
avoid block cache when possible. Or just call 
CPLSetThreadLocalConfigOption("GDAL_ONE_BIG_READ", "YES") in 
GDALVirtualMem::DoIOBandSequential() / DoIOPixelInterleaved()

Even

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] ISO WKB

2013-12-18 Thread Even Rouault
Le mercredi 18 décembre 2013 06:28:16, Paul Ramsey a écrit :
> I don't think we should expose the ISO geometry types to the world,
> they're just for WKB really, so I'll keep that part hidden away. It's
> a shame we can't get rid of the 25d type variants for gdal2... if not
> then, when?

Ah, I didn't perceive you wanted to go that far. Well, that's certainly 
something that could be done for a GDAL 2. It would require a RFC to draw the 
battle plan and analyze the impacts.

> 
> Incidentally, is there going to be a GDAL 1.11?

Technically, at that point, no breaking changes have been done in trunk, so 
1.11 would make sense as a version number.

Even

> 
> P.
> 
> On Tue, Dec 17, 2013 at 1:50 PM, Even Rouault
> 
>  wrote:
> > Le mardi 17 décembre 2013 22:38:26, Paul Ramsey a écrit :
> >> OK, so hide the ISO types from the outside world. No problem.
> >> 
> >> Is it OK to have getGeometryType and exportToWkb accept wkbVariant
> >> optional parameters?
> > 
> > For exportToWkb(), it is just a matter of taste whether to add an
> > optional parameter or to have a dedicated method.
> > 
> > For getGeometryType(), as it returns a OGRwkbGeometryType, you can't add
> > an optional parameter to return values other than OGRwkbGeometryType. My
> > latest proposal was to have a - protected - "int
> > getGeometryType(wkbVariant)  { return
> > (eVariant == wkbVariantOgc) ? getGeometryType()  :
> > getIsoGeometryType(); }" and a public OGRwkbIsoGeometryType
> > getIsoGeometryType().
> > 
> >> P.
> >> 
> >> On Tue, Dec 17, 2013 at 1:03 AM, Even Rouault
> >> 
> >>  wrote:
> >> > Selon Paul Ramsey :
> >> >> Back to this, is it OK?
> >> > 
> >> > As said in
> >> > http://lists.osgeo.org/pipermail/gdal-dev/2013-December/037738.html, I
> >> > feel a bit unconfortable with the extension of the OGRwkbGeometryType
> >> > enumeration that has possible impacts on other parts of OGR. There's
> >> > perhaps a time where we will touch it, but I'd expect it to ideally
> >> > embrace Z, M, ZM, circular geometries at once. And that would deserve
> >> > a RFC.
> >> > 
> >> > What do you think of keeping it an internal enumeration of OGR, since
> >> > that's probably all you need for now ?
> >> > 
> >> > "Or have a separate OGRwkbIsoGeometryType enumeration { wkbPointIso,
> >> > ... wkbGeometryCollectionIso, wkbPointIsoZ, ...
> >> > wkbGeometryCollectionIsoZ }, a getIsoGeometryType() method that
> >> > returns it, and the exportToWkb() methods that calls int
> >> > getGeometryType(OGRwkbVariant eVariant) { return (eVariant ==
> >> > wkbVariantOgc) ? getGeometryType()  : getIsoGeometryType(); }"
> >> > 
> >> > I'd be happy to hear about other GDAL developers opinion on this.
> >> > 
> >> >> How are we patching back to SVN? I can convert
> >> >> it into a patch and attach to a ticket, if that's the path.
> >> > 
> >> > git-svn can be used to bridge the 2 worlds, but in my recent
> >> > experience it has been painful to use. So generating a patch and
> >> > applying it is probably easier.
> >> > 
> >> > Even
> >> > 
> >> > --
> >> > Geospatial professional services
> >> > http://even.rouault.free.fr/services.html
> > 
> > --
> > Geospatial professional services
> > http://even.rouault.free.fr/services.html

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev