Re: [gdal-dev] Neatline for USGS PDF maps
I found this in drafts and it appears I failed to send it. Sorry for delay. Sent partly for the list archives at this point. On Sat, Jan 19, 2013 at 8:14 AM, Even Rouault even.roua...@mines-paris.org wrote: Looking more closely at those files, I see that there are various registration blocks. The algorithm up to now was to select the registration block whose neatline covered the most area in terms of pixels. In the case of OR_Newport_North_20110824_TM_geo.pdf, those blocks are : UTM Grid and Projection Orthoimage Map Layers Adjoining Quadrangles Diagram The number and names of blocks may change, but in all USGS topo PDFs samples I've tried, the Map Layers is always present and seems to the one that lead to the best results, so I've just pushed a change to select it when it is found. --config GDAL_PDF_NEATLINE is very helpful. Did you find the registration name blocks with one of the supporting PDF libraries? Is it possible to find these multiple registration name blocks from gdal? (I tried: gdalinfo --debug on without success.) Thanks for the many recent improvements for the USGS topo PDFs. --config GDAL_PDF_RENDERING_OPTIONS is very useful. You can use the following Python script to automate fetching the neatline and launching gdalwarp to use it : from osgeo import gdal import os import sys ds = gdal.Open(sys.argv[1]) neatline_wkt = ds.GetMetadataItem(NEATLINE) ds = None f = open('cutline.csv', 'wt') f.write('id,WKT\n') f.write('1,%s\n' % neatline_wkt) f.close() os.system('gdalwarp %s %s.tif ' % (sys.argv[1], sys.argv[1]) + '-crop_to_cutline -cutline cutline.csv -overwrite') This is great. I've added it to the wiki, http://trac.osgeo.org/gdal/wiki/USGS_PDF_Topo If you're interested in only the raster part, let's imagine that the above script is called cutline.py, you can try the following : export GDAL_PDF_RENDERING_OPTIONS=RASTER (or set GDAL_PDF_RENDERING_OPTIONS=RASTER on windows) python cutline.py your.pdf nearblack your.pdf -o your_rgba.pdf -of GTiff -setalpha -color 0,0,0 \ -color 255,255,255 I interpreted the above as: nearblack your.tif -o your_rgba.tif -of GTiff -setalpha -color 0,0,0 -color 255,255,255 Where your.tif is the output from cutline.py and your_rgba.tif is the output from nearblack. Thanks, Eli Best regards, Even ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Neatline for USGS PDF maps
Le mercredi 17 avril 2013 17:34:13, Eli Adam a écrit : I found this in drafts and it appears I failed to send it. Sorry for delay. Sent partly for the list archives at this point. On Sat, Jan 19, 2013 at 8:14 AM, Even Rouault even.roua...@mines-paris.org wrote: Looking more closely at those files, I see that there are various registration blocks. The algorithm up to now was to select the registration block whose neatline covered the most area in terms of pixels. In the case of OR_Newport_North_20110824_TM_geo.pdf, those blocks are : UTM Grid and Projection Orthoimage Map Layers Adjoining Quadrangles Diagram The number and names of blocks may change, but in all USGS topo PDFs samples I've tried, the Map Layers is always present and seems to the one that lead to the best results, so I've just pushed a change to select it when it is found. --config GDAL_PDF_NEATLINE is very helpful. Did you find the registration name blocks with one of the supporting PDF libraries? Is it possible to find these multiple registration name blocks from gdal? (I tried: gdalinfo --debug on without success.) I can see them in the PDF: Description = lines $ gdalinfo --debug on ~/gdal/data/geopdf/OR_Newport_North_20110824_TM_geo.pdf PDF: DPI guessed from contents stream = 600.0509320574655 PDF: OGC Encoding Best Practice style detected PDF: LGIDict Version : 2.3 PDF: Description = UTM Grid and Projection PDF: This is the largest neatline for now PDF: LGIDict Version : 2.3 PDF: Description = Orthoimage PDF: Not the largest neatline. Skipping it PDF: LGIDict Version : 2.3 PDF: Description = Map Layers PDF: The Map Layers registration will be selected PDF: LGIDict Version : 2.3 PDF: Description = Adjoining Quadrangles Diagram PDF: Not the largest neatline. Skipping it PDF: Description = Map Layers [...] ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Neatline for USGS PDF maps
Le samedi 19 janvier 2013 03:38:16, Eli Adam a écrit : Checking over some USGS topo PDFs, the neatline reported appears too large. Has anyone else check this or noticed anything similar? Specific details below. Eli, Looking more closely at those files, I see that there are various registration blocks. The algorithm up to now was to select the registration block whose neatline covered the most area in terms of pixels. In the case of OR_Newport_North_20110824_TM_geo.pdf, those blocks are : UTM Grid and Projection Orthoimage Map Layers Adjoining Quadrangles Diagram The number and names of blocks may change, but in all USGS topo PDFs samples I've tried, the Map Layers is always present and seems to the one that lead to the best results, so I've just pushed a change to select it when it is found. You can use the following Python script to automate fetching the neatline and launching gdalwarp to use it : from osgeo import gdal import os import sys ds = gdal.Open(sys.argv[1]) neatline_wkt = ds.GetMetadataItem(NEATLINE) ds = None f = open('cutline.csv', 'wt') f.write('id,WKT\n') f.write('1,%s\n' % neatline_wkt) f.close() os.system('gdalwarp %s %s.tif ' % (sys.argv[1], sys.argv[1]) + '-crop_to_cutline -cutline cutline.csv -overwrite') Best regards, Even ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Neatline for USGS PDF maps
Le samedi 19 janvier 2013 16:28:53, Even Rouault a écrit : Le samedi 19 janvier 2013 03:38:16, Eli Adam a écrit : Checking over some USGS topo PDFs, the neatline reported appears too large. Has anyone else check this or noticed anything similar? Specific details below. Eli, Looking more closely at those files, I see that there are various registration blocks. The algorithm up to now was to select the registration block whose neatline covered the most area in terms of pixels. In the case of OR_Newport_North_20110824_TM_geo.pdf, those blocks are : UTM Grid and Projection Orthoimage Map Layers Adjoining Quadrangles Diagram The number and names of blocks may change, but in all USGS topo PDFs samples I've tried, the Map Layers is always present and seems to the one that lead to the best results, so I've just pushed a change to select it when it is found. You can use the following Python script to automate fetching the neatline and launching gdalwarp to use it : from osgeo import gdal import os import sys ds = gdal.Open(sys.argv[1]) neatline_wkt = ds.GetMetadataItem(NEATLINE) ds = None f = open('cutline.csv', 'wt') f.write('id,WKT\n') f.write('1,%s\n' % neatline_wkt) f.close() os.system('gdalwarp %s %s.tif ' % (sys.argv[1], sys.argv[1]) + '-crop_to_cutline -cutline cutline.csv -overwrite') If you're interested in only the raster part, let's imagine that the above script is called cutline.py, you can try the following : export GDAL_PDF_RENDERING_OPTIONS=RASTER (or set GDAL_PDF_RENDERING_OPTIONS=RASTER on windows) python cutline.py your.pdf nearblack your.pdf -o your_rgba.pdf -of GTiff -setalpha -color 0,0,0 \ -color 255,255,255 Best regards, Even ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
[gdal-dev] Neatline for USGS PDF maps
Checking over some USGS topo PDFs, the neatline reported appears too large. Has anyone else check this or noticed anything similar? Specific details below. I did the same thing and got the same result. If you make a shapefile out of the neatline and view it, you will see that it matches to the black. So it is a correct result but not intended. So we need different values for the neatline. Here are values that I just estimated off of QGIS: Record_Id,wkb_Polygon 1,POLYGON ((420793 4955647,420689 4941858,410784 4942004,410971 4955792)) Using this gives expected results. Does this pdf file have incorrect neatline information? I'll look at some others to see if they work better. It looks to me that the USGS topo pdf from http://ims.er.usgs.gov/gda_services/download?item_id=5365522 reports a neatline that covers most of the pdf, NEATLINE=POLYGON ((421614.539994676539209 4956417.675689895637333,421413.787766160559841 4941008.479600958526134,409984.382813899661414 4941157.382794972509146,410185.135042413836345 4956566.578883905895054,421614.539994676539209 4956417.675689895637333)) but should cover much less area as estimated out of QGIS above. Is this an error within the file or an error in what gdalinfo reports or something else? If it is an error in the file, I can contact the USGS liaison for the Pacific Northwest to see if it can be fixed (at least for Oregon). I checked other USGS Topos in OR, CO, MI and found the same problem. I tried some in ND and IA and the neatline seemed correct. Specifically, http://ims.er.usgs.gov/gda_services/download?item_id=5154397quad=Grangerstate=IAgrid=7.5X7.5series=TNM%20GeoPDF and http://ims.er.usgs.gov/gda_services/download?item_id=5251428quad=Nelson%20Lakestate=NDgrid=7.5X7.5series=TNM%20GeoPDF It is great to have the pdf driver to make more data accessible. GDAL/OGR always makes me smile when I encounter data in some new to me format and it is already supported (in the last few months, SEG-Y). Best Regards, Eli ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev