Re: [Talk-transit] Naptan import

2009-07-27 Thread Roger Slevin
Peter

Locality Classification was added as a possible nice to have to the
version 2 schema but it has not been populated, and no guidance has been
created to indicate how this field should be used (save for a table of
permitted values).  There is no classification data in NPTG other than that
which comes from the source - and that is only there because it could be ...
I would not recommend its use as it is flaky, and offers nothing in respect
of newly created locality entries in the Gazetteer.

NPTG is NOT a POI directory - and whilst there are some incorrectly created
localities for POIs we are seeking to get them removed unless they genuinely
define a locality (so the only ones that are appropriate are those which
relate to large area POIs that do not sit happily within general-purpose
POIs.

The data that is recognised as valid at present is only that which appears
in v2 CSV lists ... anything which is in the XML that is not in the CSV
output is almost certainly not populated and certainly should be ignored.

Roger

-Original Message-
From: talk-transit-boun...@openstreetmap.org
[mailto:talk-transit-boun...@openstreetmap.org] On Behalf Of Peter Miller
Sent: 27 July 2009 08:52
To: Christoph Böhme
Cc: talk-transit@openstreetmap.org
Subject: Re: [Talk-transit] Naptan import


On 26 Jul 2009, at 22:14, Christoph Böhme wrote:

 Hi

 Peter Miller peter.mil...@itoworld.com schrieb:
 I am also aware that there is a 50K place gazetteer sitting there
 untouched - last week I was adding villages in Norfolk by hand and
 the data is sitting available in NPTG.

 I taught myself XSLT at the weekend and played a bit with the NPTG
 data. On http://www.mappa-mercia.org/nptg/ you can find some html- 
 pages
 which show the hierarchies of and adjacencies between the localities  
 in
 the NTPG data.

 I also created a copy of the NOVAM viewer and changed it to display
 NTPG data instead of bus stops:

 http://www.mappa-mercia.org/cgi-bin/nptg.wsgi/viewer.html

Great stuff, and clearly there are many additional place-names in NPTG  
that are not in OSM a present in many parts of the county. I checked  
North Norfolk and bits of Scotland and there are a good number of  
additional places.


 I have not changed any of the texts/images yet, so the localities will
 be displayed as bus stops :-). I will try to import an excerpt of  
 place
 names from OSM tomorrow so that we can compare both data sets.

 From what I have seen so far an import should not be too difficult.  
 The
 only difficulties I expect are the hierarchies and the classification
 of the localities.

 Does anyone know the current way to tag hierarchies of places? I had a
 look at the wiki and there seem to be two approaches: is_in and
 relations. With the addition of actual borders there is also the
 possibility of defining hierarchies purely geometrical.

 The location classifications in the NPTG seem to be relatively coarse.
 Everything below a parish is either a New Entry (Add) or a Locality.
 We need to see how this can be mapped to POI types in OSM.

SourceLocalityType is, I think, information about where the data came  
from in the first place into NPTG and is not relevant for our  
purposes, and certainly into the classification field.

The LocalityClassification field should be more useful and should  
contain city, town, village, hamlet, suburb, urbancentre, place of  
interest, other, or unrecorded. I am not sure how well this field is  
populated - possibly it is not well populated at all. UrbanCentre can  
possibly be ignored.  The field may be well populated in some parts of  
the country and not in other. I am not sure how much NPTG is used for  
Points of Interest. There is a POI model in NPTG but possibly we treat  
this separately or not at all or import the data as invisible to start  
with. My main interest is the locality names and the main technical  
job will probably be to spot duplicates with what is in OSM already.

See page 69 in the NaPTAN and NPTG scheme guide for more details of  
the formatting.
http://www.naptan.org.uk/documentation.htm


 Do you need help with the NaPTAN import or are you just about ready
 to do the work? Do we need to set up a wiki page where people can
 request imports for their authority or are we going to do it without
 that?


It would be really really good to get NaPTAN in and in soon. There are  
people keen to get on with sorting the data out in their areas who are  
sitting on their hands at present, the professional transport  
community is watching what is happening closely, and there are also  
possibly other datasets from UK authorities that could come our way  
when we have completed this one.

 I am happy to continue working on the NPTG import if Thomas does not
 mind.

My vote is to get on with it - the NPTG and NaPTAN imports are  
different enough that they can be handled separately. If Thomas  
focuses on the NaPTAN import (or hands it over to someone) and you do  
the NPTG then I 

Re: [Talk-transit] Naptan import

2009-07-27 Thread Christoph Böhme
Good evening,

Peter Miller peter.mil...@itoworld.com schrieb:
 On 26 Jul 2009, at 22:14, Christoph Böhme wrote:
  I also created a copy of the NOVAM viewer and changed it to display
  NTPG data instead of bus stops:
 
  http://www.mappa-mercia.org/cgi-bin/nptg.wsgi/viewer.html
 
 Great stuff, and clearly there are many additional place-names in
 NPTG that are not in OSM a present in many parts of the county. I
 checked North Norfolk and bits of Scotland and there are a good
 number of additional places.

I have now also added all nodes with place=* tags from OSM. The NPTG
import will really add a lot of additional places! OSM has only 25397
places in the UK at the moment. However, I was a bit suprised to see
some hamlets in the OSM data which are not in the NPTG data. Do you
know of any gaps in the NPTG data?

 The LocalityClassification field should be more useful and should  
 contain city, town, village, hamlet, suburb, urbancentre, place of  
 interest, other, or unrecorded. I am not sure how well this field is  
 populated - possibly it is not well populated at all. UrbanCentre
 can possibly be ignored.  

The LocalityClassification tag is used 856 times in the dataset. That is
about 2% of all localities.

 The field may be well populated in some parts of the country and not
 in other. I am not sure how much NPTG is used for Points of Interest.
 There is a POI model in NPTG but possibly we treat this separately or
 not at all or import the data as invisible to start with. My main
 interest is the locality names and the main technical job will
 probably be to spot duplicates with what is in OSM already.

Finding duplicates should not be too difficult. We basically just need
to check for each imported location if there are any places with the
same name within a reasonable distance. Except for typos and different
spellings that should work very well. The positions of locations in
both datasets also match nicely which should make it even easier to
find duplicates.

 Would it be worth creating a NPTG Import wiki page and an NPTG
 Import user to do the actual import - ie, keep the documentation and
 audit trail for the two imports separate?

I am in favour of keeping them separate. Both datasets are fairly
independent and we will probably use different methods to import them.
Having everything on one wiki page will be confusing to users, who might
be interested only in one of the imports.

Cheers,
Christoph

___
Talk-transit mailing list
Talk-transit@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-transit


Re: [Talk-transit] Naptan import

2009-07-27 Thread Christoph Böhme
Hi

Roger Slevin ro...@slevin.plus.com schrieb:

 Locality Classification was added as a possible nice to have to the
 version 2 schema but it has not been populated, and no guidance has
 been created to indicate how this field should be used (save for a
 table of permitted values).  There is no classification data in NPTG
 other than that which comes from the source - and that is only there
 because it could be ... I would not recommend its use as it is flaky,
 and offers nothing in respect of newly created locality entries in
 the Gazetteer.

So, it looks like we will not have any classification information.
Unless we just want to import the plain names this will complicate the
import a bit as we have to somehow map the locations to OSM place-types.
At the moment I am having three ideas how we could do this:

Based on the parent relationship we could guess if a location might
be a suburb or village.

Many places have wikipedia entries (even villages). If we can manage
to automatically look the entries up and extract the relevant
information (population size) from the info box we could probably
classify a lot of places.

The landsat data might give us some hints about the size of places. We
just need to find a way to retrieve this information automatically :-)

Alternatively we could just invent a value for unclassified places and
wait for people to classify the places.

Do you have any other ideas?

Cheers,
Christoph

___
Talk-transit mailing list
Talk-transit@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-transit


Re: [Talk-transit] Naptan import

2009-07-27 Thread Roger Slevin
You ask about the omissions from NPTG.  Perhaps it would be helpful if I 
described the history of creating NPTG and what the brief has been to local 
data editors in terms of what is or is not included in the database.

NPTG started life as a national statistical gazetteer based on a collation of 
different statistical areas (parishes, journey to work areas, towns, cities, 
etc).  A number of unwanted types of entity in that source data were marked as 
inactive (things like area parishes which cover several villages) - and local 
editors were briefed to remove other sources of duplication.

We then had the difficulty of determining what is, and what is not, a locality. 
 The guidance we have given has been that a locality is a place which locals 
would consider they lived in, worked in, were educated in etc ... and/or to 
which highway engineers would consider it appropriate to show on road direction 
signs.  Although NPTG was originally for public transport purposes, we stressed 
at all times that a locality should be listed even if it has no public 
transport - but we know that some local editors have probably erred towards 
marking some unserved rural hamlets as inactive. 

All inactive localities should still be in the data - so hamlets which are 
missing may be in NPTG, but marked as inactive.  However they may simply 
never have been in the source data - and no one to date has recognised the need 
to add them to NPTG.  It would be interesting to see what localities OSM holds 
in its data which are not included in NPTG (as well as the reverse of this) if 
that is possible.

I hope this helps your understanding of the background.

Roger

-Original Message-
From: talk-transit-boun...@openstreetmap.org 
[mailto:talk-transit-boun...@openstreetmap.org] On Behalf Of Christoph Böhme
Sent: 27 July 2009 21:50
To: Peter Miller
Cc: talk-transit@openstreetmap.org
Subject: Re: [Talk-transit] Naptan import

Good evening,

Peter Miller peter.mil...@itoworld.com schrieb:
 On 26 Jul 2009, at 22:14, Christoph Böhme wrote:
  I also created a copy of the NOVAM viewer and changed it to display
  NTPG data instead of bus stops:
 
  http://www.mappa-mercia.org/cgi-bin/nptg.wsgi/viewer.html
 
 Great stuff, and clearly there are many additional place-names in
 NPTG that are not in OSM a present in many parts of the county. I
 checked North Norfolk and bits of Scotland and there are a good
 number of additional places.

I have now also added all nodes with place=* tags from OSM. The NPTG
import will really add a lot of additional places! OSM has only 25397
places in the UK at the moment. However, I was a bit suprised to see
some hamlets in the OSM data which are not in the NPTG data. Do you
know of any gaps in the NPTG data?

 The LocalityClassification field should be more useful and should  
 contain city, town, village, hamlet, suburb, urbancentre, place of  
 interest, other, or unrecorded. I am not sure how well this field is  
 populated - possibly it is not well populated at all. UrbanCentre
 can possibly be ignored.  

The LocalityClassification tag is used 856 times in the dataset. That is
about 2% of all localities.

 The field may be well populated in some parts of the country and not
 in other. I am not sure how much NPTG is used for Points of Interest.
 There is a POI model in NPTG but possibly we treat this separately or
 not at all or import the data as invisible to start with. My main
 interest is the locality names and the main technical job will
 probably be to spot duplicates with what is in OSM already.

Finding duplicates should not be too difficult. We basically just need
to check for each imported location if there are any places with the
same name within a reasonable distance. Except for typos and different
spellings that should work very well. The positions of locations in
both datasets also match nicely which should make it even easier to
find duplicates.

 Would it be worth creating a NPTG Import wiki page and an NPTG
 Import user to do the actual import - ie, keep the documentation and
 audit trail for the two imports separate?

I am in favour of keeping them separate. Both datasets are fairly
independent and we will probably use different methods to import them.
Having everything on one wiki page will be confusing to users, who might
be interested only in one of the imports.

Cheers,
Christoph

___
Talk-transit mailing list
Talk-transit@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-transit


___
Talk-transit mailing list
Talk-transit@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-transit


Re: [Talk-transit] Naptan import

2009-07-27 Thread Roger Slevin
One other possibility that might work would be to look at the number of bus
stops associated with a locality - something fairly easy to measure from
NaPTAN.  Combine this with the parent / child locality relationship could
give you a way of expressing a sort of locality type classification.

Roger


-Original Message-
From: Christoph Böhme [mailto:christ...@b3e.net] 
Sent: 27 July 2009 22:14
To: ro...@slevin.plus.com
Cc: 'Public transport/transit/shared taxi related topics'
Subject: Re: [Talk-transit] Naptan import

Hi

Roger Slevin ro...@slevin.plus.com schrieb:

 Locality Classification was added as a possible nice to have to the
 version 2 schema but it has not been populated, and no guidance has
 been created to indicate how this field should be used (save for a
 table of permitted values).  There is no classification data in NPTG
 other than that which comes from the source - and that is only there
 because it could be ... I would not recommend its use as it is flaky,
 and offers nothing in respect of newly created locality entries in
 the Gazetteer.

So, it looks like we will not have any classification information.
Unless we just want to import the plain names this will complicate the
import a bit as we have to somehow map the locations to OSM place-types.
At the moment I am having three ideas how we could do this:

Based on the parent relationship we could guess if a location might
be a suburb or village.

Many places have wikipedia entries (even villages). If we can manage
to automatically look the entries up and extract the relevant
information (population size) from the info box we could probably
classify a lot of places.

The landsat data might give us some hints about the size of places. We
just need to find a way to retrieve this information automatically :-)

Alternatively we could just invent a value for unclassified places and
wait for people to classify the places.

Do you have any other ideas?

Cheers,
Christoph


___
Talk-transit mailing list
Talk-transit@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-transit


Re: [Talk-transit] Naptan import

2009-07-27 Thread Chris Hill
Christoph Böhme wrote:
 Hi

 Roger Slevin ro...@slevin.plus.com schrieb:

   
 Locality Classification was added as a possible nice to have to the
 version 2 schema but it has not been populated, and no guidance has
 been created to indicate how this field should be used (save for a
 table of permitted values).  There is no classification data in NPTG
 other than that which comes from the source - and that is only there
 because it could be ... I would not recommend its use as it is flaky,
 and offers nothing in respect of newly created locality entries in
 the Gazetteer.
 

 So, it looks like we will not have any classification information.
 Unless we just want to import the plain names this will complicate the
 import a bit as we have to somehow map the locations to OSM place-types.
 At the moment I am having three ideas how we could do this:

 Based on the parent relationship we could guess if a location might
 be a suburb or village.

 Many places have wikipedia entries (even villages). If we can manage
 to automatically look the entries up and extract the relevant
 information (population size) from the info box we could probably
 classify a lot of places.

 The landsat data might give us some hints about the size of places. We
 just need to find a way to retrieve this information automatically :-)

 Alternatively we could just invent a value for unclassified places and
 wait for people to classify the places.

 Do you have any other ideas?

   
Ask for local experts.  I have maintained a list of places in East 
Yorkshire in the wiki.  There are about 280 villages and hamlets.  I've 
visited almost 90% to map them and assess if they are really still a 
place.  Many have been added from NPE and they just don't exist on the 
ground any more.  I then judge village versus hamlet on criteria, like 
size, is there a school, church, shop etc. and what does the Wikipedia 
entry or other web sites say.  I then add local knowledge.

Having done this work I would prefer that a bulk upload doesn't add 
places in the county without prior discussion.  You would probably be 
able to find someone to do a sanity check like this for many (most? 
all?) areas.  My experience is that sources of UK places need human 
intervention to make them useful.

Cheers, Chris

___
Talk-transit mailing list
Talk-transit@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-transit