[Talk-transit] Re: GTFS compatibility

2010-06-30 Thread Hillsman, Edward
Our center has a project to explore the use of OSM as a repository and tool for 
supporting multimodal trip planners (for example, bike to transit, ride the 
bus, walk or bike to final destination). We are keenly interested in the 
current discussion of transit and GTFS in OSM, because one of our tasks is to 
develop software to import from GTFS into OSM, and then update the import as a 
transit agency modifies its routes or stops, taking into account that OSM 
mappers may have found and corrected errors in what was uploaded (or may have 
introduced errors). I'm writing to share some of our experience and get your 
suggestions. We will make the software we develop in this project (for 
uploading, matching, and updating GTFS data in OSM) publicly available.

We think it should be relatively easy to upload a set of GTFS stops into an 
area where no one has mapped bus stops into OSM. Generating the route relations 
will be harder and we may not accomplish that as part of this project. And we 
think that updating such data will be relatively simple, because it can rely on 
tags identifying and cross-referencing the stops; software would look for 
changes, and manual work would be needed to reconcile them. The hard part is 
going to be designing the initial upload process to work in areas where OSM 
already includes some bus stops, but not all of them. In the state of Florida, 
where we are working, there are about 450 stops already in OSM, many in areas 
served by transit agencies with GTFS data. Obviously, we want to respect what 
has been mapped. Things that complicate the initial upload include:

(1) Locational errors in the GTFS data. These are not systematic, and some are 
surprisingly large. One is more than 200 meters from its actual location, and 
only about 10 meters from another stop that GTFS has within 10 meters of its 
actual location (and that is mapped accurately in OSM). We came into this 
project knowing that there is locational error in GTFS. Now we are trying to 
figure out how to deal with it. The GTFS locations do match those appearing in 
Google Transit, by the way.
(2) Locational errors in the OSM data. These aren't systematic either but tend 
to be much smaller, except that in a few cases the stop has been recorded on 
the wrong side of the street, and a mapper in one city has recorded stops as 
nodes defining the street way rather than as points to the sides of the street.
(3) Incomplete and inconsistent tagging of the OSM stops. 
(4) The presence in an area of stops for multiple agencies, only one of which 
has GTFS data. Our campus has a shuttle bus circulator system with no GTFS data 
(they operate without a set schedule but with a target 10-minute headway, and 
frequency changes during the day and with the university class schedule). The 
area's main public transportation agency has several routes that pass through 
the campus, and has GTFS data. Most of the public-agency stops on campus, but 
not all, are also campus shuttle stops, and there are many more shuttle stops 
on campus than there are public-agency stops.
(5) Incomplete mapping of stops for each agency in OSM.

At the moment, we are rethinking the whole idea of trying to match the GTFS 
stops to the OSM stops for the initial upload. One idea would be to screen all 
stops in a GTFS area to look for tags indicating the operator (or no operator), 
tag all of them with a FIXME describing that an upload has occurred and may 
produce duplicates, but otherwise leave them alone, and then upload the GTFS 
ones. I see problems with that, and in any case it should be done only if there 
is a commitment by the uploader to work quickly to reconcile the two data sets 
in OSM. Given the surprisingly large locational errors in GTFS, I'm also 
uncomfortable with simply uploading it, because putting bad data into the 
system will create confusion. I suspect this is a problem with all uploads. 
We've certainly seen it with the TIGER street data.

But we are still in the thinking-about-this stage, haven't made any decisions, 
and are looking for suggestions and comments (hence this posting). Until we get 
a much better handle on the initial upload problems, any actual uploading we do 
as part of the project will be limited to the area of our campus, where we know 
what is actually on the ground and can clean up anything we do. We'd definitely 
enjoy sharing work and ideas.

Ed Hillsman

Edward L. Hillsman, Ph.D.
Senior Research Associate
Center for Urban Transportation Research
University of South Florida
4202 Fowler Ave., CUT100
Tampa, FL  33620-5375
813-974-2977 (tel)
813-974-5168 (fax)
hills...@cutr.usf.edu
http://www.cutr.usf.edu



On Tue, 29 Jun 2010 15:26:07 +0100 Joe Hughes j...@headwayblog.com wrote:
I agree that it would be helpful to end up with something that allows
straightforward conversions to and from the GTFS format.  GTFS is a
CC-licensed specification [1] which is evolved by an open community
process [2].  Also, the 

Re: [Talk-transit] GTFS compatibility

2010-06-30 Thread Joe Hughes
Ed,

Great to see someone from the CUTR efforts chiming in here.

Just to clarify one point, when you say locational errors in GTFS,
you're referring to issues with the source data from the particular
agencies that you're working with, rather than anything having to do
with the representation format itself.  This has also been an
occasional issue with stop data being imported into OSM from NaPTAN.

One of the most important things that we can accomplish with these
efforts is to help find ways to establish two-way flows between these
official sources of data and the distributed army of volunteers and
developers who have a vested interest in improving the accuracy of
their local data.  The progress with this here in the UK has been slow
but encouraging, and there has lately been a lot of good work by the
transit agencies in Boston and New York to be more responsive to
feedback from consumers of the data.

Cheers,
Joe

On Wed, Jun 30, 2010 at 2:25 PM, Hillsman, Edward hills...@cutr.usf.edu wrote:
 Our center has a project to explore the use of OSM as a repository and tool 
 for supporting multimodal trip planners (for example, bike to transit, ride 
 the bus, walk or bike to final destination). We are keenly interested in the 
 current discussion of transit and GTFS in OSM, because one of our tasks is to 
 develop software to import from GTFS into OSM, and then update the import as 
 a transit agency modifies its routes or stops, taking into account that OSM 
 mappers may have found and corrected errors in what was uploaded (or may have 
 introduced errors). I'm writing to share some of our experience and get your 
 suggestions. We will make the software we develop in this project (for 
 uploading, matching, and updating GTFS data in OSM) publicly available.

 We think it should be relatively easy to upload a set of GTFS stops into an 
 area where no one has mapped bus stops into OSM. Generating the route 
 relations will be harder and we may not accomplish that as part of this 
 project. And we think that updating such data will be relatively simple, 
 because it can rely on tags identifying and cross-referencing the stops; 
 software would look for changes, and manual work would be needed to reconcile 
 them. The hard part is going to be designing the initial upload process to 
 work in areas where OSM already includes some bus stops, but not all of them. 
 In the state of Florida, where we are working, there are about 450 stops 
 already in OSM, many in areas served by transit agencies with GTFS data. 
 Obviously, we want to respect what has been mapped. Things that complicate 
 the initial upload include:

 (1) Locational errors in the GTFS data. These are not systematic, and some 
 are surprisingly large. One is more than 200 meters from its actual location, 
 and only about 10 meters from another stop that GTFS has within 10 meters of 
 its actual location (and that is mapped accurately in OSM). We came into this 
 project knowing that there is locational error in GTFS. Now we are trying to 
 figure out how to deal with it. The GTFS locations do match those appearing 
 in Google Transit, by the way.
 (2) Locational errors in the OSM data. These aren't systematic either but 
 tend to be much smaller, except that in a few cases the stop has been 
 recorded on the wrong side of the street, and a mapper in one city has 
 recorded stops as nodes defining the street way rather than as points to the 
 sides of the street.
 (3) Incomplete and inconsistent tagging of the OSM stops.
 (4) The presence in an area of stops for multiple agencies, only one of which 
 has GTFS data. Our campus has a shuttle bus circulator system with no GTFS 
 data (they operate without a set schedule but with a target 10-minute 
 headway, and frequency changes during the day and with the university class 
 schedule). The area's main public transportation agency has several routes 
 that pass through the campus, and has GTFS data. Most of the public-agency 
 stops on campus, but not all, are also campus shuttle stops, and there are 
 many more shuttle stops on campus than there are public-agency stops.
 (5) Incomplete mapping of stops for each agency in OSM.

 At the moment, we are rethinking the whole idea of trying to match the GTFS 
 stops to the OSM stops for the initial upload. One idea would be to screen 
 all stops in a GTFS area to look for tags indicating the operator (or no 
 operator), tag all of them with a FIXME describing that an upload has 
 occurred and may produce duplicates, but otherwise leave them alone, and then 
 upload the GTFS ones. I see problems with that, and in any case it should be 
 done only if there is a commitment by the uploader to work quickly to 
 reconcile the two data sets in OSM. Given the surprisingly large locational 
 errors in GTFS, I'm also uncomfortable with simply uploading it, because 
 putting bad data into the system will create confusion. I suspect this is a 
 problem with all uploads. 

Re: [Talk-transit] GTFS compatibility

2010-06-30 Thread john whelan
I'm currently looking at Bus stops in Ottawa in OSM and finding similar
issues with the existing bus_stops.  I'm seriously wondering where
stop_codes exist if one approach might be to import bus_stops using GTFS
data and use the GTFS tags such as stop_code etc from stops.txt
http://code.google.com/transit/spec/transit_feed_specification.html#stops_txt___Field_Definitions

Tools such as JOSM have a search facility so we should be able to search for
bus_stops without a stop_code then reconcile them with the ones that have a
stop_code tag.  If the GTFS data is wrong then we should be able to send a
report somewhere probably the transit authority saying we think this stop is
incorrect.

My personal view is while we should respect work done already adding extra
tags in this way doesn't remove this work and it is up to the rendering
rules to either omit or include a particular bus_stop for display.  This can
be selected by the presence or absence of a stop_code tag, certainly in
Maperitive.

If we were to do this we would probably need some sort of wiki write up and
a standard way to label bus_stops.  Currently in Ottawa I've seen at least
four different ways the ones with the bus route on being the least useful as
they tend to be out of date.

I don't think mapping routes works at all well.  Certainly in Ottawa the bus
stops and stop_codes stay in the same physical place but the bus routes can
be modified three or four times a year.  Some changes are greater than
others and the transit route planning system that can be accessed from the
web or by phone includes school buses which are not listed in the stop but
do sometimes provide a useful and quicker way to get from point A to point
B.

Cheerio John


On 30 June 2010 09:25, Hillsman, Edward hills...@cutr.usf.edu wrote:

 Our center has a project to explore the use of OSM as a repository and tool
 for supporting multimodal trip planners (for example, bike to transit, ride
 the bus, walk or bike to final destination). We are keenly interested in the
 current discussion of transit and GTFS in OSM, because one of our tasks is
 to develop software to import from GTFS into OSM, and then update the import
 as a transit agency modifies its routes or stops, taking into account that
 OSM mappers may have found and corrected errors in what was uploaded (or may
 have introduced errors). I'm writing to share some of our experience and get
 your suggestions. We will make the software we develop in this project (for
 uploading, matching, and updating GTFS data in OSM) publicly available.

 We think it should be relatively easy to upload a set of GTFS stops into an
 area where no one has mapped bus stops into OSM. Generating the route
 relations will be harder and we may not accomplish that as part of this
 project. And we think that updating such data will be relatively simple,
 because it can rely on tags identifying and cross-referencing the stops;
 software would look for changes, and manual work would be needed to
 reconcile them. The hard part is going to be designing the initial upload
 process to work in areas where OSM already includes some bus stops, but not
 all of them. In the state of Florida, where we are working, there are about
 450 stops already in OSM, many in areas served by transit agencies with GTFS
 data. Obviously, we want to respect what has been mapped. Things that
 complicate the initial upload include:

 (1) Locational errors in the GTFS data. These are not systematic, and some
 are surprisingly large. One is more than 200 meters from its actual
 location, and only about 10 meters from another stop that GTFS has within 10
 meters of its actual location (and that is mapped accurately in OSM). We
 came into this project knowing that there is locational error in GTFS. Now
 we are trying to figure out how to deal with it. The GTFS locations do match
 those appearing in Google Transit, by the way.
 (2) Locational errors in the OSM data. These aren't systematic either but
 tend to be much smaller, except that in a few cases the stop has been
 recorded on the wrong side of the street, and a mapper in one city has
 recorded stops as nodes defining the street way rather than as points to the
 sides of the street.
 (3) Incomplete and inconsistent tagging of the OSM stops.
 (4) The presence in an area of stops for multiple agencies, only one of
 which has GTFS data. Our campus has a shuttle bus circulator system with no
 GTFS data (they operate without a set schedule but with a target 10-minute
 headway, and frequency changes during the day and with the university class
 schedule). The area's main public transportation agency has several routes
 that pass through the campus, and has GTFS data. Most of the public-agency
 stops on campus, but not all, are also campus shuttle stops, and there are
 many more shuttle stops on campus than there are public-agency stops.
 (5) Incomplete mapping of stops for each agency in OSM.

 At the moment, we are rethinking the 

Re: [Talk-transit] GTFS compatibility

2010-06-30 Thread Joe Hughes
Many transport agencies/operators still don't have rider-facing stop
codes, and some may have them for only a subset of their stops.
However, as you say, where they *are* present, they present the most
stable dataset-local identifier for stop, if only because of the costs
involved in changing real-world signage.

The other contenders in GTFS are the stop_id (which is unstable,
originating as it does by the whims of the agency's database
maintainers) and the stop_name (which is more generally stable but
still subject to occasional tweaking, and which is not guaranteed to
be dataset unique).

It seems clear that there will always need to be some sort of fuzzy
matching of stops employed in subsequent updates/imports of data in
countries that have no national registry.  It also seems plausible to
me that someone could build an OSM-based shadow global stop registry,
and thus leapfrog national governments that haven't built such a
database of their own.

Cheers,
Joe

On Wed, Jun 30, 2010 at 4:27 PM, john whelan jwhelan0...@gmail.com wrote:
 I'm currently looking at Bus stops in Ottawa in OSM and finding similar
 issues with the existing bus_stops.  I'm seriously wondering where
 stop_codes exist if one approach might be to import bus_stops using GTFS
 data and use the GTFS tags such as stop_code etc from stops.txt
 http://code.google.com/transit/spec/transit_feed_specification.html#stops_txt___Field_Definitions

 Tools such as JOSM have a search facility so we should be able to search for
 bus_stops without a stop_code then reconcile them with the ones that have a
 stop_code tag.  If the GTFS data is wrong then we should be able to send a
 report somewhere probably the transit authority saying we think this stop is
 incorrect.

 My personal view is while we should respect work done already adding extra
 tags in this way doesn't remove this work and it is up to the rendering
 rules to either omit or include a particular bus_stop for display.  This can
 be selected by the presence or absence of a stop_code tag, certainly in
 Maperitive.

 If we were to do this we would probably need some sort of wiki write up and
 a standard way to label bus_stops.  Currently in Ottawa I've seen at least
 four different ways the ones with the bus route on being the least useful as
 they tend to be out of date.

 I don't think mapping routes works at all well.  Certainly in Ottawa the bus
 stops and stop_codes stay in the same physical place but the bus routes can
 be modified three or four times a year.  Some changes are greater than
 others and the transit route planning system that can be accessed from the
 web or by phone includes school buses which are not listed in the stop but
 do sometimes provide a useful and quicker way to get from point A to point
 B.

 Cheerio John


 On 30 June 2010 09:25, Hillsman, Edward hills...@cutr.usf.edu wrote:

 Our center has a project to explore the use of OSM as a repository and
 tool for supporting multimodal trip planners (for example, bike to transit,
 ride the bus, walk or bike to final destination). We are keenly interested
 in the current discussion of transit and GTFS in OSM, because one of our
 tasks is to develop software to import from GTFS into OSM, and then update
 the import as a transit agency modifies its routes or stops, taking into
 account that OSM mappers may have found and corrected errors in what was
 uploaded (or may have introduced errors). I'm writing to share some of our
 experience and get your suggestions. We will make the software we develop in
 this project (for uploading, matching, and updating GTFS data in OSM)
 publicly available.

 We think it should be relatively easy to upload a set of GTFS stops into
 an area where no one has mapped bus stops into OSM. Generating the route
 relations will be harder and we may not accomplish that as part of this
 project. And we think that updating such data will be relatively simple,
 because it can rely on tags identifying and cross-referencing the stops;
 software would look for changes, and manual work would be needed to
 reconcile them. The hard part is going to be designing the initial upload
 process to work in areas where OSM already includes some bus stops, but not
 all of them. In the state of Florida, where we are working, there are about
 450 stops already in OSM, many in areas served by transit agencies with GTFS
 data. Obviously, we want to respect what has been mapped. Things that
 complicate the initial upload include:

 (1) Locational errors in the GTFS data. These are not systematic, and some
 are surprisingly large. One is more than 200 meters from its actual
 location, and only about 10 meters from another stop that GTFS has within 10
 meters of its actual location (and that is mapped accurately in OSM). We
 came into this project knowing that there is locational error in GTFS. Now
 we are trying to figure out how to deal with it. The GTFS locations do match
 those appearing in Google Transit, by the