On Tue, Jul 3, 2018 at 2:37 AM, Christoph Hormann <o...@imagico.de> wrote:
> On Monday 02 July 2018, Greg Morgan wrote: > > I have started work on the Bing building import for Arizona us. I > > have started this page here > > https://wiki.openstreetmap.org/wiki/Import/Catalogue/US/BingBuildings > > for the import. This wiki page may be used by other mappers in > > different states. > > > > [...] > > First thanks for bringing this up early in the process - although this > is too early obviously for an import review it is good to have a broad > discussion early. > > Christoph, Based on another email that you sent, I now see that there are two Microsoft efforts. There is this older version here https://wiki.openstreetmap.org/wiki/ Microsoft_Building_Footprint_Data There is the newer verrsion here https://github.com/Microsoft/USBuildingFootprints https://blogs.bing.com/maps/2018-06/microsoft-releases-125-million-building-footprints-in-the-us-as-open-data The Bing Maps team has been applying these techniques as well with the goal to increase the coverage of building footprints available for OpenStreetMap <https://www.openstreetmap.org/>. As a result, today we are announcing that we are releasing 124 Million building footprints in the United States to the OpenStreetMap community. > A few points i would like to comment on: > > * legal aspects: Microsoft released the data under the ODbL but does > not specify what data sources go into producing it (in particular > training data!) and does not make any claims that the data is free of > third party rights. I would not be fine with importing data of unknow > provenance and without a meaningful guarantee that it is free of third > party rights. > Some of the answers are here. https://github.com/Microsoft/USBuildingFootprints Specifically that means sub meter airplane flown images in the metro Phoenix area will be better than the satellite images used in the rural area of Arizona. As far as I have read, they are using the same Bing imagery as what I used in JOSM. I believe that the provenance is there own data as noted here. Training details The training set consists of 5 million labeled images. Majority of the satellite images cover diverse residential areas in US. For the sake of good set representation, we have enriched the set with samples from various areas covering mountains, glaciers, forests, deserts, beaches, coasts, etc. Images in the set are of 256x256 pixel size with 1 ft/pixel resolution. The training is done with CNTK toolkit using 32 GPUs. Data Vintage The vintage of the footprints depends on the vintage of the underlying imagery. Because Bing Imagery is a composite of multiple sources it is difficult to know the exact dates for individual pieces of data. > > * quality aspects: In contrast to almost all other data sets where > there is some quantitative specification of quality (either explicitly > or implicitly due to the purpose the data set is created for) there is > no indication of quality in what Microsoft has released beyond the > vague and meaningless 'awsome quality' claims. IMO this means that a > proper import review would only be possible based on a thorough > analysis of the quality of Microsoft's product that holds up to > scientific scrutiny. > That is part of the process that I am undertaking now based on their official answer. I posted this sample earlier today https://drive.google.com/open?id=1I7BPMKLgABk8ikUdEPFpl6zKgh9E-sDN . Hans had a look at the data with me. The file has too many nodes to import based on the 10,000 node limit. The footprints have around the same level of detail as this subdivision entered by craft mapper Turtur, a German mapper, based on his edits. https://www.openstreetmap.org/#map=17/33.67387/-112.40286. The bing footprints look to be of a higher quality than the craft mapper pezizomycotina. a US mapper from Pennsylvania. https://www.openstreetmap.org/#map=16/33.6095/-111.9375 It looks like the " CNTK we apply our Deep Neural Networks and the ResNet34 with RefineNet up-sampling layers to detect building footprints from the Bing imagery." https://github.com/Microsoft/CNTK has problems with black roof top buildings or solar panels used as roof top covered parking spaces. In my early opinion, the foot prints are no better nor no worse than a craft mapper,s drawing. The craft mapper's skill level may play a large part in the quality. I sent a thank you to Turtur for the buildings. I have yet to do so to pezizomycotina. However, a nice square build is always a good start. I keep think that a couple of Mapillary runs would allow me to collect addresses and build on pezizomycotina's work. https://github.com/Microsoft/USBuildingFootprints How good is the data? Our metrics show that in the vast majority of cases the quality is at least as good as data hand digitized buildings in OpenStreetMap. It is not perfect, particularly in dense urban areas but it is still awesome. Regarding quality in general - you should not make the mistake of trying > to assess quality by picking a few places and manually reviewing the > data based on gut feeling - possibly with the same imagery used as > reference as Microsoft used in data set generation. What i > called "analysis of the quality that holds up to scientific scrutiny" > means picking a sufficiently large number of sample locations > representative for the diverse geography of the US and doing a > quantitative analysis based on reference data of known and high > quality. > I have created my tool to generate potential import candidates. I have not had the chance to explore more of the data yet. As your other post provided an idea of starting with Montana, that will not be useful in my case. The rural Arizona area I posted is no different than Montana. I've lived in both places. I will still create some other files to explore this topic. Zoom 13 tiles can be used to group many of the buildings. You can see the Arizona counts here https://drive.google.com/open?id=1_ciQdkkC655xUqoKI_as4uJjQ6aVqAjfU5bddyk8Q1k . In the cases where there are over 9,999 nodes, then a zoom 16 or 15 aggregate will work best. https://drive.google.com/open?id=12oQ6NxpyDRrnMjfGnANVGFD3fOxd_reZ0iiZstymg4A > Microsoft's process documentation contains a number of hints that > indicate things can go wrong in the process in ways that are likely to > produce significant errors of kinds that are very unlikely to happen in > manual mapping. Without having reliable data on how often these things > do happen (and how this varies between different geographic settings) > you would essentially be doing a blind import. > Depending on the craft mapper, hand drawn buildings can have the same problems. In one area of the data that was posted, two buildings that are two buildings were drawn as one. I've seen this same effect with craft mappers too. Unless, you know that there is a gap, then it is easier to draw the building as one. As noted with the prior answer, this will not be a blind import. There will be need to evaluate over coverage between OSM and the Bing. Let me post this again, Training details The training set consists of 5 million labeled images. Majority of the satellite images cover diverse residential areas in US. For the sake of good set representation, we have enriched the set with samples from various areas covering mountains, glaciers, forests, deserts, beaches, coasts, etc. Images in the set are of 256x256 pixel size with 1 ft/pixel resolution. The training is done with CNTK toolkit using 32 GPUs. There are 9,646 potential tiles to import for Arizona based on zoom 13 tiles. That means some tiles only have one building to tiles that many thousand buildings. Some of these smaller tiles can be aggregated to reduce the number of changes to manage. The training item from their github page shows that they are trying to improve the tool by testing different geographic areas. Regards, Greg
_______________________________________________ Imports mailing list Imports@openstreetmap.org https://lists.openstreetmap.org/listinfo/imports