Re: [Talk-transit] Naptan import
Christoph Böhme wrote: > Hi > > "Roger Slevin" schrieb: > > >> Locality Classification was added as a possible "nice to have" to the >> version 2 schema but it has not been populated, and no guidance has >> been created to indicate how this field should be used (save for a >> table of permitted values). There is no classification data in NPTG >> other than that which comes from the source - and that is only there >> because it could be ... I would not recommend its use as it is flaky, >> and offers nothing in respect of newly created locality entries in >> the Gazetteer. >> > > So, it looks like we will not have any classification information. > Unless we just want to import the plain names this will complicate the > import a bit as we have to somehow map the locations to OSM place-types. > At the moment I am having three ideas how we could do this: > > Based on the parent relationship we could guess if a location might > be a suburb or village. > > Many places have wikipedia entries (even villages). If we can manage > to automatically look the entries up and extract the relevant > information (population size) from the info box we could probably > classify a lot of places. > > The landsat data might give us some hints about the size of places. We > just need to find a way to retrieve this information automatically :-) > > Alternatively we could just invent a value for unclassified places and > wait for people to classify the places. > > Do you have any other ideas? > > Ask for local experts. I have maintained a list of places in East Yorkshire in the wiki. There are about 280 villages and hamlets. I've visited almost 90% to map them and assess if they are really still a place. Many have been added from NPE and they just don't exist on the ground any more. I then judge village versus hamlet on criteria, like size, is there a school, church, shop etc. and what does the Wikipedia entry or other web sites say. I then add local knowledge. Having done this work I would prefer that a bulk upload doesn't add places in the county without prior discussion. You would probably be able to find someone to do a sanity check like this for many (most? all?) areas. My experience is that sources of UK places need human intervention to make them useful. Cheers, Chris ___ Talk-transit mailing list Talk-transit@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-transit
Re: [Talk-transit] Naptan import
One other possibility that might work would be to look at the number of bus stops associated with a locality - something fairly easy to measure from NaPTAN. Combine this with the parent / child locality relationship could give you a way of expressing a sort of locality type classification. Roger -Original Message- From: Christoph Böhme [mailto:christ...@b3e.net] Sent: 27 July 2009 22:14 To: ro...@slevin.plus.com Cc: 'Public transport/transit/shared taxi related topics' Subject: Re: [Talk-transit] Naptan import Hi "Roger Slevin" schrieb: > Locality Classification was added as a possible "nice to have" to the > version 2 schema but it has not been populated, and no guidance has > been created to indicate how this field should be used (save for a > table of permitted values). There is no classification data in NPTG > other than that which comes from the source - and that is only there > because it could be ... I would not recommend its use as it is flaky, > and offers nothing in respect of newly created locality entries in > the Gazetteer. So, it looks like we will not have any classification information. Unless we just want to import the plain names this will complicate the import a bit as we have to somehow map the locations to OSM place-types. At the moment I am having three ideas how we could do this: Based on the parent relationship we could guess if a location might be a suburb or village. Many places have wikipedia entries (even villages). If we can manage to automatically look the entries up and extract the relevant information (population size) from the info box we could probably classify a lot of places. The landsat data might give us some hints about the size of places. We just need to find a way to retrieve this information automatically :-) Alternatively we could just invent a value for unclassified places and wait for people to classify the places. Do you have any other ideas? Cheers, Christoph ___ Talk-transit mailing list Talk-transit@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-transit
Re: [Talk-transit] Naptan import
Hi "Roger Slevin" schrieb: > Locality Classification was added as a possible "nice to have" to the > version 2 schema but it has not been populated, and no guidance has > been created to indicate how this field should be used (save for a > table of permitted values). There is no classification data in NPTG > other than that which comes from the source - and that is only there > because it could be ... I would not recommend its use as it is flaky, > and offers nothing in respect of newly created locality entries in > the Gazetteer. So, it looks like we will not have any classification information. Unless we just want to import the plain names this will complicate the import a bit as we have to somehow map the locations to OSM place-types. At the moment I am having three ideas how we could do this: Based on the parent relationship we could guess if a location might be a suburb or village. Many places have wikipedia entries (even villages). If we can manage to automatically look the entries up and extract the relevant information (population size) from the info box we could probably classify a lot of places. The landsat data might give us some hints about the size of places. We just need to find a way to retrieve this information automatically :-) Alternatively we could just invent a value for unclassified places and wait for people to classify the places. Do you have any other ideas? Cheers, Christoph ___ Talk-transit mailing list Talk-transit@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-transit
Re: [Talk-transit] Naptan import
You ask about the omissions from NPTG. Perhaps it would be helpful if I described the history of creating NPTG and what the brief has been to local data editors in terms of what is or is not included in the database. NPTG started life as a national statistical gazetteer based on a collation of different statistical areas (parishes, journey to work areas, towns, cities, etc). A number of unwanted types of entity in that source data were marked as inactive (things like area parishes which cover several villages) - and local editors were briefed to remove other sources of duplication. We then had the difficulty of determining what is, and what is not, a locality. The guidance we have given has been that a locality is a place which locals would consider they lived in, worked in, were educated in etc ... and/or to which highway engineers would consider it appropriate to show on road direction signs. Although NPTG was originally for public transport purposes, we stressed at all times that a locality should be listed even if it has no public transport - but we know that some local editors have probably erred towards marking some unserved rural hamlets as "inactive". All "inactive" localities should still be in the data - so hamlets which are missing may be in NPTG, but marked as "inactive". However they may simply never have been in the source data - and no one to date has recognised the need to add them to NPTG. It would be interesting to see what localities OSM holds in its data which are not included in NPTG (as well as the reverse of this) if that is possible. I hope this helps your understanding of the background. Roger -Original Message- From: talk-transit-boun...@openstreetmap.org [mailto:talk-transit-boun...@openstreetmap.org] On Behalf Of Christoph Böhme Sent: 27 July 2009 21:50 To: Peter Miller Cc: talk-transit@openstreetmap.org Subject: Re: [Talk-transit] Naptan import Good evening, Peter Miller schrieb: > On 26 Jul 2009, at 22:14, Christoph Böhme wrote: > > I also created a copy of the NOVAM viewer and changed it to display > > NTPG data instead of bus stops: > > > > http://www.mappa-mercia.org/cgi-bin/nptg.wsgi/viewer.html > > Great stuff, and clearly there are many additional place-names in > NPTG that are not in OSM a present in many parts of the county. I > checked North Norfolk and bits of Scotland and there are a good > number of additional places. I have now also added all nodes with place=* tags from OSM. The NPTG import will really add a lot of additional places! OSM has only 25397 places in the UK at the moment. However, I was a bit suprised to see some hamlets in the OSM data which are not in the NPTG data. Do you know of any gaps in the NPTG data? > The LocalityClassification field should be more useful and should > contain city, town, village, hamlet, suburb, urbancentre, place of > interest, other, or unrecorded. I am not sure how well this field is > populated - possibly it is not well populated at all. UrbanCentre > can possibly be ignored. The LocalityClassification tag is used 856 times in the dataset. That is about 2% of all localities. > The field may be well populated in some parts of the country and not > in other. I am not sure how much NPTG is used for Points of Interest. > There is a POI model in NPTG but possibly we treat this separately or > not at all or import the data as invisible to start with. My main > interest is the locality names and the main technical job will > probably be to spot duplicates with what is in OSM already. Finding duplicates should not be too difficult. We basically just need to check for each imported location if there are any places with the same name within a reasonable distance. Except for typos and different spellings that should work very well. The positions of locations in both datasets also match nicely which should make it even easier to find duplicates. > Would it be worth creating a NPTG Import wiki page and an NPTG > Import user to do the actual import - ie, keep the documentation and > audit trail for the two imports separate? I am in favour of keeping them separate. Both datasets are fairly independent and we will probably use different methods to import them. Having everything on one wiki page will be confusing to users, who might be interested only in one of the imports. Cheers, Christoph ___ Talk-transit mailing list Talk-transit@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-transit ___ Talk-transit mailing list Talk-transit@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-transit
Re: [Talk-transit] Naptan import
Good evening, Peter Miller schrieb: > On 26 Jul 2009, at 22:14, Christoph Böhme wrote: > > I also created a copy of the NOVAM viewer and changed it to display > > NTPG data instead of bus stops: > > > > http://www.mappa-mercia.org/cgi-bin/nptg.wsgi/viewer.html > > Great stuff, and clearly there are many additional place-names in > NPTG that are not in OSM a present in many parts of the county. I > checked North Norfolk and bits of Scotland and there are a good > number of additional places. I have now also added all nodes with place=* tags from OSM. The NPTG import will really add a lot of additional places! OSM has only 25397 places in the UK at the moment. However, I was a bit suprised to see some hamlets in the OSM data which are not in the NPTG data. Do you know of any gaps in the NPTG data? > The LocalityClassification field should be more useful and should > contain city, town, village, hamlet, suburb, urbancentre, place of > interest, other, or unrecorded. I am not sure how well this field is > populated - possibly it is not well populated at all. UrbanCentre > can possibly be ignored. The LocalityClassification tag is used 856 times in the dataset. That is about 2% of all localities. > The field may be well populated in some parts of the country and not > in other. I am not sure how much NPTG is used for Points of Interest. > There is a POI model in NPTG but possibly we treat this separately or > not at all or import the data as invisible to start with. My main > interest is the locality names and the main technical job will > probably be to spot duplicates with what is in OSM already. Finding duplicates should not be too difficult. We basically just need to check for each imported location if there are any places with the same name within a reasonable distance. Except for typos and different spellings that should work very well. The positions of locations in both datasets also match nicely which should make it even easier to find duplicates. > Would it be worth creating a NPTG Import wiki page and an NPTG > Import user to do the actual import - ie, keep the documentation and > audit trail for the two imports separate? I am in favour of keeping them separate. Both datasets are fairly independent and we will probably use different methods to import them. Having everything on one wiki page will be confusing to users, who might be interested only in one of the imports. Cheers, Christoph ___ Talk-transit mailing list Talk-transit@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-transit
Re: [Talk-transit] Naptan import
Peter Locality Classification was added as a possible "nice to have" to the version 2 schema but it has not been populated, and no guidance has been created to indicate how this field should be used (save for a table of permitted values). There is no classification data in NPTG other than that which comes from the source - and that is only there because it could be ... I would not recommend its use as it is flaky, and offers nothing in respect of newly created locality entries in the Gazetteer. NPTG is NOT a POI directory - and whilst there are some incorrectly created localities for POIs we are seeking to get them removed unless they genuinely define a locality (so the only ones that are appropriate are those which relate to large area POIs that do not sit happily within general-purpose POIs. The data that is recognised as valid at present is only that which appears in v2 CSV lists ... anything which is in the XML that is not in the CSV output is almost certainly not populated and certainly should be ignored. Roger -Original Message- From: talk-transit-boun...@openstreetmap.org [mailto:talk-transit-boun...@openstreetmap.org] On Behalf Of Peter Miller Sent: 27 July 2009 08:52 To: Christoph Böhme Cc: talk-transit@openstreetmap.org Subject: Re: [Talk-transit] Naptan import On 26 Jul 2009, at 22:14, Christoph Böhme wrote: > Hi > > Peter Miller schrieb: >> I am also aware that there is a 50K place gazetteer sitting there >> untouched - last week I was adding villages in Norfolk by hand and >> the data is sitting available in NPTG. > > I taught myself XSLT at the weekend and played a bit with the NPTG > data. On http://www.mappa-mercia.org/nptg/ you can find some html- > pages > which show the hierarchies of and adjacencies between the localities > in > the NTPG data. > > I also created a copy of the NOVAM viewer and changed it to display > NTPG data instead of bus stops: > > http://www.mappa-mercia.org/cgi-bin/nptg.wsgi/viewer.html Great stuff, and clearly there are many additional place-names in NPTG that are not in OSM a present in many parts of the county. I checked North Norfolk and bits of Scotland and there are a good number of additional places. > > I have not changed any of the texts/images yet, so the localities will > be displayed as bus stops :-). I will try to import an excerpt of > place > names from OSM tomorrow so that we can compare both data sets. > > From what I have seen so far an import should not be too difficult. > The > only difficulties I expect are the hierarchies and the classification > of the localities. > > Does anyone know the current way to tag hierarchies of places? I had a > look at the wiki and there seem to be two approaches: is_in and > relations. With the addition of actual borders there is also the > possibility of defining hierarchies purely geometrical. > > The location classifications in the NPTG seem to be relatively coarse. > Everything below a parish is either a "New Entry" (Add) or a Locality. > We need to see how this can be mapped to POI types in OSM. SourceLocalityType is, I think, information about where the data came from in the first place into NPTG and is not relevant for our purposes, and certainly into the classification field. The LocalityClassification field should be more useful and should contain city, town, village, hamlet, suburb, urbancentre, place of interest, other, or unrecorded. I am not sure how well this field is populated - possibly it is not well populated at all. UrbanCentre can possibly be ignored. The field may be well populated in some parts of the country and not in other. I am not sure how much NPTG is used for Points of Interest. There is a POI model in NPTG but possibly we treat this separately or not at all or import the data as invisible to start with. My main interest is the locality names and the main technical job will probably be to spot duplicates with what is in OSM already. See page 69 in the NaPTAN and NPTG scheme guide for more details of the formatting. http://www.naptan.org.uk/documentation.htm > >> Do you need help with the NaPTAN import or are you just about ready >> to do the work? Do we need to set up a wiki page where people can >> request imports for their authority or are we going to do it without >> that? > It would be really really good to get NaPTAN in and in soon. There are people keen to get on with sorting the data out in their areas who are sitting on their hands at present, the professional transport community is watching what is happening closely, and there are also possibly other datasets from UK authorities that could come our way when we have completed this one. > I am happy to continue working on the NPTG import if Thomas does not > mind. My vote is to get on with it - the NPTG and NaPTAN imports are different enough that they can be handled separately. If Thomas focuses on the NaPTAN import (or hands it over to someone) and
Re: [Talk-transit] Naptan import
On 26 Jul 2009, at 22:14, Christoph Böhme wrote: > Hi > > Peter Miller schrieb: >> I am also aware that there is a 50K place gazetteer sitting there >> untouched - last week I was adding villages in Norfolk by hand and >> the data is sitting available in NPTG. > > I taught myself XSLT at the weekend and played a bit with the NPTG > data. On http://www.mappa-mercia.org/nptg/ you can find some html- > pages > which show the hierarchies of and adjacencies between the localities > in > the NTPG data. > > I also created a copy of the NOVAM viewer and changed it to display > NTPG data instead of bus stops: > > http://www.mappa-mercia.org/cgi-bin/nptg.wsgi/viewer.html Great stuff, and clearly there are many additional place-names in NPTG that are not in OSM a present in many parts of the county. I checked North Norfolk and bits of Scotland and there are a good number of additional places. > > I have not changed any of the texts/images yet, so the localities will > be displayed as bus stops :-). I will try to import an excerpt of > place > names from OSM tomorrow so that we can compare both data sets. > > From what I have seen so far an import should not be too difficult. > The > only difficulties I expect are the hierarchies and the classification > of the localities. > > Does anyone know the current way to tag hierarchies of places? I had a > look at the wiki and there seem to be two approaches: is_in and > relations. With the addition of actual borders there is also the > possibility of defining hierarchies purely geometrical. > > The location classifications in the NPTG seem to be relatively coarse. > Everything below a parish is either a "New Entry" (Add) or a Locality. > We need to see how this can be mapped to POI types in OSM. SourceLocalityType is, I think, information about where the data came from in the first place into NPTG and is not relevant for our purposes, and certainly into the classification field. The LocalityClassification field should be more useful and should contain city, town, village, hamlet, suburb, urbancentre, place of interest, other, or unrecorded. I am not sure how well this field is populated - possibly it is not well populated at all. UrbanCentre can possibly be ignored. The field may be well populated in some parts of the country and not in other. I am not sure how much NPTG is used for Points of Interest. There is a POI model in NPTG but possibly we treat this separately or not at all or import the data as invisible to start with. My main interest is the locality names and the main technical job will probably be to spot duplicates with what is in OSM already. See page 69 in the NaPTAN and NPTG scheme guide for more details of the formatting. http://www.naptan.org.uk/documentation.htm > >> Do you need help with the NaPTAN import or are you just about ready >> to do the work? Do we need to set up a wiki page where people can >> request imports for their authority or are we going to do it without >> that? > It would be really really good to get NaPTAN in and in soon. There are people keen to get on with sorting the data out in their areas who are sitting on their hands at present, the professional transport community is watching what is happening closely, and there are also possibly other datasets from UK authorities that could come our way when we have completed this one. > I am happy to continue working on the NPTG import if Thomas does not > mind. My vote is to get on with it - the NPTG and NaPTAN imports are different enough that they can be handled separately. If Thomas focuses on the NaPTAN import (or hands it over to someone) and you do the NPTG then I think we will get there faster. Would it be worth creating a NPTG Import wiki page and an NPTG Import user to do the actual import - ie, keep the documentation and audit trail for the two imports separate? Regards, Peter > > Christoph ___ Talk-transit mailing list Talk-transit@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-transit