Re: [gdal-dev] Filesize too large when writing compressed float's to a Geotiff from Python
It makes sense that the order in which the data is written/stored affects the performance of the compression, but i don't get why it would be different for integers as compared to floats? Floats are larger than Int8 and Int16, so for the same amount of GDAL_CACHEMAX, you can cache less blocks, causing more temporary flushes of partial tiles/strips to disk (partial = that have data only for one or two bands, but not the 3), than need to be refetched when data for the remaining bands is available and then recompressed and rewritten. Regards, Rutger Even Rouault-2 wrote Le mercredi 03 juin 2015 15:21:07, Rutger a écrit : Rutger, the issue is that you write data band after band, whereas by default the GTiff driver create pixel-interleaved datasets. So some blocks in the GTiff might be reread and rewritten several times as the data coming from the various bands come. Several fixes/workarounds : - if you've sufficient RAM to hold another copy of the uncompressed dataset, increase GDAL_CACHEMAX - or add options = [ 'INTERLEAVE=BAND' ] in the Create() call to create a band interleaved dataset - more involved fix: since there's no dataset WriteArray() in GDAL Python for now, you would have to iterate block by block and for each block write the corresponding region of each band. - you could also use Dataset.WriteRaster() if you can get a buffer from the numpy array Even -- View this message in context: http://osgeo-org.1560.x6.nabble.com/Filesize-too-large-when-writing-compre ssed-float-s-to-a-Geotiff-from-Python-tp5208916p5209075.html Sent from the GDAL - Dev mailing list archive at Nabble.com. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Filesize too large when writing compressed float's to a Geotiff from Python
Even, Thanks for the suggestions, the first two work well. I'll have a look at the ds.WriteRaster, that seems an interesting way, since it also prevents unnecessary looping over the bands. Writing per block is what i usually do, maybe that's why i never noticed it before. I now ran into it while fetching and writing a dataset from OpenDAP, whereas i usually read blocks from GTiffs. It makes sense that the order in which the data is written/stored affects the performance of the compression, but i don't get why it would be different for integers as compared to floats? Regards, Rutger Even Rouault-2 wrote Le mercredi 03 juin 2015 15:21:07, Rutger a écrit : Rutger, the issue is that you write data band after band, whereas by default the GTiff driver create pixel-interleaved datasets. So some blocks in the GTiff might be reread and rewritten several times as the data coming from the various bands come. Several fixes/workarounds : - if you've sufficient RAM to hold another copy of the uncompressed dataset, increase GDAL_CACHEMAX - or add options = [ 'INTERLEAVE=BAND' ] in the Create() call to create a band interleaved dataset - more involved fix: since there's no dataset WriteArray() in GDAL Python for now, you would have to iterate block by block and for each block write the corresponding region of each band. - you could also use Dataset.WriteRaster() if you can get a buffer from the numpy array Even -- View this message in context: http://osgeo-org.1560.x6.nabble.com/Filesize-too-large-when-writing-compressed-float-s-to-a-Geotiff-from-Python-tp5208916p5209075.html Sent from the GDAL - Dev mailing list archive at Nabble.com. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Filesize too large when writing compressed float's to a Geotiff from Python
Le mercredi 03 juin 2015 15:21:07, Rutger a écrit : Dear list, When i try to write a floating point Geotiff from Python (both 32 or 64 bit), the resulting file size is significantly larger compared to the output of gdal_translate. I wrote a small test script which tests this it for various creation options like compression and block-sizes. It seems to be the case for any compression method (packbits, lzw, deflate), and other creation options don't seem to matter. Uncompressed data, or compressed integers work fine, the file size matches gdal_translate very well. Here is the notebook i used: http://nbviewer.ipython.org/gist/RutgerK/27c4af235035621fb609 I would first all be interested to know if anyone can replicate this behavior? And if there is something i can do to prevent this? I would rather avoid having to run gdal_translate after each file written from Python. Rutger, the issue is that you write data band after band, whereas by default the GTiff driver create pixel-interleaved datasets. So some blocks in the GTiff might be reread and rewritten several times as the data coming from the various bands come. Several fixes/workarounds : - if you've sufficient RAM to hold another copy of the uncompressed dataset, increase GDAL_CACHEMAX - or add options = [ 'INTERLEAVE=BAND' ] in the Create() call to create a band interleaved dataset - more involved fix: since there's no dataset WriteArray() in GDAL Python for now, you would have to iterate block by block and for each block write the corresponding region of each band. - you could also use Dataset.WriteRaster() if you can get a buffer from the numpy array Even I'm running it on Windows 7 64bit. My GDAL version comes from the default Conda repository which contains both bindings and utilities (at least for version 1.11.1). All i could find was this thread from 2010, it seems somewhat similar except that there its about Uint16, for which it works for me: http://osgeo-org.1560.x6.nabble.com/gdal-dev-RE-Compression-using-the-creat e-method-in-python-and-aggregation-methods-td3747703.html Regards, Rutger -- View this message in context: http://osgeo-org.1560.x6.nabble.com/Filesize-too-large-when-writing-compre ssed-float-s-to-a-Geotiff-from-Python-tp5208916.html Sent from the GDAL - Dev mailing list archive at Nabble.com. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
[gdal-dev] Filesize too large when writing compressed float's to a Geotiff from Python
Dear list, When i try to write a floating point Geotiff from Python (both 32 or 64 bit), the resulting file size is significantly larger compared to the output of gdal_translate. I wrote a small test script which tests this it for various creation options like compression and block-sizes. It seems to be the case for any compression method (packbits, lzw, deflate), and other creation options don't seem to matter. Uncompressed data, or compressed integers work fine, the file size matches gdal_translate very well. Here is the notebook i used: http://nbviewer.ipython.org/gist/RutgerK/27c4af235035621fb609 I would first all be interested to know if anyone can replicate this behavior? And if there is something i can do to prevent this? I would rather avoid having to run gdal_translate after each file written from Python. I'm running it on Windows 7 64bit. My GDAL version comes from the default Conda repository which contains both bindings and utilities (at least for version 1.11.1). All i could find was this thread from 2010, it seems somewhat similar except that there its about Uint16, for which it works for me: http://osgeo-org.1560.x6.nabble.com/gdal-dev-RE-Compression-using-the-create-method-in-python-and-aggregation-methods-td3747703.html Regards, Rutger -- View this message in context: http://osgeo-org.1560.x6.nabble.com/Filesize-too-large-when-writing-compressed-float-s-to-a-Geotiff-from-Python-tp5208916.html Sent from the GDAL - Dev mailing list archive at Nabble.com. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev