Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Apollinaris Schoell

On 5 Aug 2010, at 14:43 , Alan Mintz wrote:

 At 2010-08-05 11:52, Ian Dees wrote:
 ...
 It isn't any different. I had made the (bad) decision at the time to import 
 over any existing data because in the several hundred places I spot-checked, 
 NHD was vastly superior in resolution (and probably quality).
 
 By import over, do you mean to add duplicates, replace the existing 
 features, or merge the info from the two manually?
 
 As I manually survey various features (POIs, some hydro, etc.), I usually try 
 to merge in the data from existing imports so as to maintain the link (e.g. 
 gnis:feature_id) back to the original database, in case we want to exchange 
 updates with them again.
 

this is impossible due to the license terms, 

 One thing that occurs to me that may be a problem is that I occasionally have 
 to delete a feature that is no longer present (e.g. 
 http://www.openstreetmap.org/browse/node/358808220). If we were to feed an 
 update back to GNIS or get one from them, this situation would have to be 
 taken into account.
 
 --
 Alan Mintz alan_mintz+...@earthlink.net
 
 
 ___
 Talk-us mailing list
 Talk-us@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk-us


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Serge Wroclawski
Moving away from discussions of specific imports, I'd like to explore
what people think about a few areas of this discussion:

1) When someone says I want to import X, what should our first response be?

2) When someone points out a widespread problem (such as the Salt Lake
City addresses), how do we want to proceed?

3) Is it better to discourage bots and imports (as we do currently) or
better to heavily document bots and set up standardized methods? (and
do people think those methods will be used?)

4) In the US, what (if any) role should OSM US play in imports?


And now my .02:

1. I think the first reactions to a request to import should be
something that outlines the danger to OSM of importing. That's the
guide this thread talks about. We want to instill on the user the
potential pitfalls and encourage them to work with the community-
maybe even discovering that the data set was known previously and not
imported for a reason.

2. I think widespread bot fixes should be encouraged to wait 10
days. It's just too easy to make a large change and too hard to fix
it. I'd also suggest that we (as a community) develop tools to make it
easier to demonstrate what an import or bot would do on a test server.

Imagine I want to fix all the streets in Cleveland. I could spin up an
instance of Cleveland as of a certain time, apply my changes to that
test site, and show it off to the large community, soliciting
feedback.

This isn't really feasible right now using existing OSM methods.

3. I think imports and bots are inevitable, so the more documented we
make the process, the less we encourage people to go wild and write
their own. At the same time, we want to discourage bots and imports in
general.

4. I think OSM US can play a significant role in two ways. I think the
organization can help by working with governments to make data sets
available. And I think it could possibly help with some equipment and
infrastructure. Those are why I'm involved in OSM US now, and (blatant
plug) why I'm running for office on the next board.

At the same time, I think the process needs to be bottom-up community driven.

- Serge

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Emilie Laffray
On 5 August 2010 20:27, Ian Dees ian.d...@gmail.com wrote:

 On Sat, Jan 8, 2000 at 3:20 PM, Katie Filbert filbe...@gmail.com wrote:

 The difference with NHD is that we are leaving conversion to osm format
 for the local mapper / importer.  Since OSM US has server space, maybe
 that's good use of it to host converted data ready for import.


 I like this... the NHD status page on the wiki sort of already does this in
 a backwards way. Perhaps I will look in to writing a web tool to keep track
 of the import and give easy access to the pre-generated OSM files for
 subbasins.


I think keeping the data in a database and generating on the fly with a more
advanced interface like (http://clc.openstreetmap.fr) would be very good. I
have been meaning to implement a webservice which would generate OSM file
with specific functionalities based on some kind of layer and requests to
power a site like the one shown previously.

Emilie Laffray
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Katie Filbert
On Fri, Aug 6, 2010 at 9:11 AM, Serge Wroclawski emac...@gmail.com wrote:

 Moving away from discussions of specific imports, I'd like to explore
 what people think about a few areas of this discussion:

 1) When someone says I want to import X, what should our first response
 be?


The nature of OSM with few rules (compared to say the many rules on
Wikipedia) is appealing in some aspects and I don't want to see OSM become
burdened with so many rules.

At the same time, we might learn some lessons from how Wikipedia handles
bots...

1) Anyone that wants to run a bot or new tasks for an existing bot
(automated or semi-automated tasks) must submit a request to the bot
approval group (BAG). Others are free to comment on the request, in addition
to BAG.

2) You explain what the bot will be doing.  The BAG assesses whether it's a
good idea, and gives constructive feedback

3) Bot operators are encouraged to share the code, at least with BAG, but
ideally make it open source so others can review it.

4) The bot then goes through a trial (e.g. doing 50 edits)

5) The bot runs on a separate account from the user's normal account.  The
bot account is flagged, so it's hidden by default from Special:RecentChanges
and gets higher API rate limits.

The bot's user page has information on who's running the bot, what it's
doing, bot shutoff button that anyone can use if the bot is AWOL, info on
how to contact the bot operator, and the bot operator needs to be
responsive.

http://en.wikipedia.org/wiki/Wikipedia:Bots

Certainly not all bots and imports are bad, but I would be happy to have
such careful attention and review for OSM bots and imports to help ensure
the task is suitable, the bot works properly, and is not disruptive or
harmful to the community.


 2) When someone points out a widespread problem (such as the Salt Lake
 City addresses), how do we want to proceed?


I'm not totally convinced it's effective, but Wikipedia handles disputes and
issues with requests for comments and tries to reach consensus.  For
something like the addresses, there may be not be 100% consensus but say,
3/4 agreement would be good, making compromises necessary to get there.

http://en.wikipedia.org/wiki/WP:RFC

Things can escalate from there, if necessary.  For OSM, we tend to discuss
things on the mailing list, and we may want to do things differently.  Not
sure what's best.


 3) Is it better to discourage bots and imports (as we do currently) or
 better to heavily document bots and set up standardized methods? (and
 do people think those methods will be used?)


See above (1).

Furthermore, Wikipedia users have gone as far as to create bot frameworks
(pywikipedia) that are well-tested and there are tools (e.g.
autowikibrowser) for semi-automated edits.

For OSM, something else we ought to do better with is using the dev API
server (http://*api06*.*dev*.openstreetmap.org/).  Last I knew, it's not
populated with data except what individuals put in it.  It would be great
the dev server instead was a full, up-to-date mirror of OSM that people
could use to test imports and semi/fully automated edits.  I think this is
especially important since, unlike setting up MediaWiki, it's not so simple
for individuals to setup their own OSM stack

More testing and more eyes on bots and imports, I think the better for bad
bots and imports to be weeded out and the good, useful ones can proceed.


 4) In the US, what (if any) role should OSM US play in imports?


Not sure it needs to be OSM US specifically, but having a staging area (e.g.
to store copies of data imported -- in original  osm format? -- and a good
development server for testing are important.



 And now my .02:

 1. I think the first reactions to a request to import should be
 something that outlines the danger to OSM of importing. That's the
 guide this thread talks about. We want to instill on the user the
 potential pitfalls and encourage them to work with the community-
 maybe even discovering that the data set was known previously and not
 imported for a reason.


Community feedback is indeed important.


 2. I think widespread bot fixes should be encouraged to wait 10
 days. It's just too easy to make a large change and too hard to fix
 it. I'd also suggest that we (as a community) develop tools to make it
 easier to demonstrate what an import or bot would do on a test server.

 Imagine I want to fix all the streets in Cleveland. I could spin up an
 instance of Cleveland as of a certain time, apply my changes to that
 test site, and show it off to the large community, soliciting
 feedback.


Agree.



 This isn't really feasible right now using existing OSM methods.

 3. I think imports and bots are inevitable, so the more documented we
 make the process, the less we encourage people to go wild and write
 their own. At the same time, we want to discourage bots and imports in
 general.


I agree to some extent about discouraging bots and imports, at the same time
realize that in 

Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Apollinaris Schoell

On 6 Aug 2010, at 1:45 , Nathan Edgars II wrote:

 On Fri, Aug 6, 2010 at 3:50 AM, Apollinaris Schoell ascho...@gmail.com 
 wrote:
 
 On 5 Aug 2010, at 14:43 , Alan Mintz wrote:
 
 As I manually survey various features (POIs, some hydro, etc.), I usually 
 try to merge in the data from existing imports so as to maintain the link 
 (e.g. gnis:feature_id) back to the original database, in case we want to 
 exchange updates with them again.
 
 
 this is impossible due to the license terms,
 
 There are no (valid) license terms applicable to something of the form
 OSM deleted feature 687645; check independently whether it exists and
 delete it from GNIS if not.

sure not in this form,  this form requires so much work on GNIS side that it 
will probably never happen.
the deletion of the node can happen for so many reasons that without 
documentation it has no value. 
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Serge Wroclawski
Good to see your comments getting through Katie (I was one of the
people who didn't get your emails before).

On Fri, Aug 6, 2010 at 11:59 AM, Katie Filbert filbe...@gmail.com wrote:
 On Fri, Aug 6, 2010 at 9:11 AM, Serge Wroclawski emac...@gmail.com wrote:

 Moving away from discussions of specific imports, I'd like to explore
 what people think about a few areas of this discussion:

 1) When someone says I want to import X, what should our first response
 be?

 The nature of OSM with few rules (compared to say the many rules on
 Wikipedia) is appealing in some aspects and I don't want to see OSM become
 burdened with so many rules.

 At the same time, we might learn some lessons from how Wikipedia handles
 bots...

 1) Anyone that wants to run a bot or new tasks for an existing bot
 (automated or semi-automated tasks) must submit a request to the bot
 approval group (BAG). Others are free to comment on the request, in addition
 to BAG.

 2) You explain what the bot will be doing.  The BAG assesses whether it's a
 good idea, and gives constructive feedback

 3) Bot operators are encouraged to share the code, at least with BAG, but
 ideally make it open source so others can review it.

 4) The bot then goes through a trial (e.g. doing 50 edits)

 5) The bot runs on a separate account from the user's normal account.  The
 bot account is flagged, so it's hidden by default from Special:RecentChanges
 and gets higher API rate limits.

 The bot's user page has information on who's running the bot, what it's
 doing, bot shutoff button that anyone can use if the bot is AWOL, info on
 how to contact the bot operator, and the bot operator needs to be
 responsive.

 http://en.wikipedia.org/wiki/Wikipedia:Bots

I think these are all very reasonable.

 2) When someone points out a widespread problem (such as the Salt Lake
 City addresses), how do we want to proceed?

 I'm not totally convinced it's effective, but Wikipedia handles disputes and
 issues with requests for comments and tries to reach consensus.  For
 something like the addresses, there may be not be 100% consensus but say,
 3/4 agreement would be good, making compromises necessary to get there.

 http://en.wikipedia.org/wiki/WP:RFC

I think if we have a process like this, we'd want it more streamlined,
but I like the approach of having a validations and feedback period.

 3) Is it better to discourage bots and imports (as we do currently) or
 better to heavily document bots and set up standardized methods? (and
 do people think those methods will be used?)

 See above (1).

 Furthermore, Wikipedia users have gone as far as to create bot frameworks
 (pywikipedia) that are well-tested and there are tools (e.g.
 autowikibrowser) for semi-automated edits.

I agree with this. I don't think we need officially blessed bots,but
most of us have already made our own bot frameworks (I know I did), so
unless there's a compelling reason, why replicate the work?

Having a tool to display the changes is really important IMHO.
Sometimes those changes will be something where it'll be obvious and
rendering it as tiles would be good. Other times the changes won't be
something that renders, and we'll need to find a way to display the
differences in a meaningful way, but if we made a framework for it,
hopefully we could plug in that functionality as we went along.

 For OSM, something else we ought to do better with is using the dev API
 server (http://api06.dev.openstreetmap.org/).  Last I knew, it's not
 populated with data except what individuals put in it.  It would be great
 the dev server instead was a full, up-to-date mirror of OSM that people
 could use to test imports and semi/fully automated edits.  I think this is
 especially important since, unlike setting up MediaWiki, it's not so simple
 for individuals to setup their own OSM stack

api06 is meant for testing out calls to the API. That's why I suggest
something else altogether.


I also think if we start something, it'll be easier to have it adopted
by the larger OSM community.

- Serge

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Kevin Atkinson

On Fri, 6 Aug 2010, Serge Wroclawski wrote:


2. I think widespread bot fixes should be encouraged to wait 10
days. It's just too easy to make a large change and too hard to fix
it. I'd also suggest that we (as a community) develop tools to make it
easier to demonstrate what an import or bot would do on a test server.


If I had to wait 10 days there is a good chance I would of likely lost 
interest.  I have been trying to say that there are different levels of 
bots and the amount of damage they can do.



___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Serge Wroclawski
On Fri, Aug 6, 2010 at 3:13 PM, Kevin Atkinson ke...@atkinson.dhs.org wrote:
 On Fri, 6 Aug 2010, Serge Wroclawski wrote:

 2. I think widespread bot fixes should be encouraged to wait 10
 days. It's just too easy to make a large change and too hard to fix
 it. I'd also suggest that we (as a community) develop tools to make it
 easier to demonstrate what an import or bot would do on a test server.

 If I had to wait 10 days there is a good chance I would of likely lost
 interest.  I have been trying to say that there are different levels of bots
 and the amount of damage they can do.

If you lose interest quickly, it can't be very important to you.

Minor edits can be done immediately, but any time you're making a mass
change across a wide geographic region (like an entire city), that
requires planning, thinking, feedback. Those things take time, not
just for the person who wants to make the change, but for the rest of
the community to catch up, check the edit out, give feedback, etc.

The best edits in OSM took months of planning.

- Serge

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Kevin Atkinson

On Fri, 6 Aug 2010, Katie Filbert wrote:


1) Anyone that wants to run a bot or new tasks for an existing bot
(automated or semi-automated tasks) must submit a request to the bot
approval group (BAG). Others are free to comment on the request, in addition
to BAG.


If I had too go though a formal approval process I would not have even 
bothers with my script.  I'm not sure I wouldn't really call it a bot 
because I manually downloaded the data, ran a script on the data, than 
manually uploaded the data.  As oppose to something complete automatic.


And what exactly consists of a bot.  Would the clean up of Florida's 
County routes ref tagging been a bot.  It a large scale task systematic 
change, even if a script (I think he used search and replace on an editor) 
was not used.


If you want to go though with this I think you need a better definition of 
a bot which should consist of at least one of


1) Large Scale Change
2) Fully automatic

Defining 1) would be tricky, something over the united states count.  But 
what about an entire state if the change is limited in scope?



For OSM, something else we ought to do better with is using the dev API
server (http://*api06*.*dev*.openstreetmap.org/).


I did not know that site existed.  It needs to be better documented.


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Kevin Atkinson

On Fri, 6 Aug 2010, Serge Wroclawski wrote:


On Fri, Aug 6, 2010 at 3:13 PM, Kevin Atkinson ke...@atkinson.dhs.org wrote:

On Fri, 6 Aug 2010, Serge Wroclawski wrote:


2. I think widespread bot fixes should be encouraged to wait 10
days. It's just too easy to make a large change and too hard to fix
it. I'd also suggest that we (as a community) develop tools to make it
easier to demonstrate what an import or bot would do on a test server.


If I had to wait 10 days there is a good chance I would of likely lost
interest.  I have been trying to say that there are different levels of bots
and the amount of damage they can do.


If you lose interest quickly, it can't be very important to you.


Maybe most likely is a little strong, but I was trying to make a point.

Just because a change is not very important to me, doesn't mean it 
is a good change that can make the map better.


Also, it might not be that a lost interest, but rather simply don't 
have the time.  I may have some time now, but may not in two weeks.


I can fully understand that bots can do a lot of damage.  But I was also 
very careful and limited the scope of what my bot did.  I also have a 
clear plan to undue the controversial part of my change if anyone should 
object in the future.


I honestly don't think I can say anything else without coming off as a 
reckless, impatient, jerk, that wants things done now or never.___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Kevin Atkinson

On Fri, 6 Aug 2010, Kevin Atkinson wrote:


On Fri, 6 Aug 2010, Katie Filbert wrote:


1) Anyone that wants to run a bot or new tasks for an existing bot
(automated or semi-automated tasks) must submit a request to the bot
approval group (BAG). Others are free to comment on the request, in 
addition

to BAG.


If I had too go though a formal approval process I would not have even 
bothers with my script.  I'm not sure I wouldn't really call it a bot because 
I manually downloaded the data, ran a script on the data, than manually 
uploaded the data.  As oppose to something complete automatic.


Again, maybe I was a little strong.

But if you do what some sort of formal approval process can you please at 
least cut out some of the steps for those who already went though the 
process once and have proven to be competent and won't do anything which 
will lead to a mess latter.



___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Alan Mintz

At 2010-08-06 06:11, Serge Wroclawski wrote:

...
1. I think the first reactions to a request to import should be
something that outlines the danger to OSM of importing.


The biggest danger of which, IMO, is duplication of existing data. I 
believe many newbies will want to import datasets that already have at 
least some representation in existing data, given that we already have 
transportation, hydrography, admin boundaries, and some POIs. This last 
category might be one of the only ones that could be genuinely useful, like 
importing a chain of restaurants, fuel stations, etc.


Import of most county land datasets (parcels, addresses, centerlines) is 
far more difficult in that it is really more of a comparison and 
synchronization than adding of data. Someone else noted the import in the 
city of Bakersfield, CA, which included parcel and building outlines, as 
well as landuse polygons that follow street edges in excruciating detail. 
It seems that, while interesting to look at, at least some of this might 
should have been discussed first, as it resulted in 10x the number of 
objects as similar areas with just centerlines.




2. I think widespread bot fixes should be encouraged to wait 10
days.


Yes. Someone said something like just long enough to annoy the author. 
Anyone who subscribes to multiple lists could easily not see something 
important or be able to comment on it for several days. The importer should 
also send a last call a day or two before.




3. I think imports and bots are inevitable, so the more documented we
make the process, the less we encourage people to go wild and write
their own. At the same time, we want to discourage bots and imports in
general.


It would be nice to have some boilerplate search/replace code or an app to 
use. Another issue is that of co-ordinating efforts. A few times, I walked 
through tagwatch and downloaded/corrected/uploaded by hand one bad key at a 
time until I got bored. I know there are people out there doing this, too, 
but it would be nice if there were a page we could use to divvy up and 
co-ordinate those efforts.


--
Alan Mintz alan_mintz+...@earthlink.net


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Greg Troxel

Serge Wroclawski emac...@gmail.com writes:

 Moving away from discussions of specific imports, I'd like to explore
 what people think about a few areas of this discussion:

 1) When someone says I want to import X, what should our first response be?

I think your reaction to point out the danger is fair.  But, living in
an area with a lot of high-quality data that has been imported rather
well, I'm not anti-import.  But I am in the imports should be
exceedingly well though out camp.

 2) When someone points out a widespread problem (such as the Salt Lake
 City addresses), how do we want to proceed?

Some things need automated edits to fix.  I would like to see safe
frameworks for this in osm svn/git/whatever, and more or less require
that the code to be run for fixups be stored as part of the coummunity
history.  It's clear that things need to be fixed, and the challenge is
to make the fixes be net positive.

 3) Is it better to discourage bots and imports (as we do currently) or
 better to heavily document bots and set up standardized methods? (and
 do people think those methods will be used?)

I think most people doing automated imports are doing so because they
want to fix something that's broken, and most are patient.  If we
provide skeleton code and especially a way to see how the fix works
before it's really committed, I think most people would be cooperative.

In my case, I've thought about several automated edits (and done zero):

  duplicate nodes at town boundaries in roads due to massgis highway
  layer.  I wrote on talk-us about what I think ought to be done, in
  terms of outlining a precondition for two nodes on same place,
  massgis tags, each the end node in a highway way with massgis tags.
  Somehow, most of this got fixed, and I don't know if it was part of
  the general de-dupe rampage or someone doing a more targetted edit.
  But as far as I can tell it was done right, and a good outcome.

  In MA, landuse=reservoir is on lots that are really reservoir
  protection.  They render blue, and I think they should be retagged.
  Or maybe mapnik and the tagging rules fixed.  So I haven't gotten
  around to this - i have gotten the clue to tread lightly and I've been
  busy.

  fuzzy matching on GNIS vs massgis points, and merging them, taking
  massgis locations, in cases where no human has edited the GNIS points.


Bots are another story; that's a long-term running process that does
automated edits whenever preconditions are satisfied.  Those are scarier
than someone grabbing a state extract, running an automated edit,
reviewing the results, maybe sharing them for review by others, and
choosing to push upload.

For imports, I've thought about several, and the common theme is
ENOSPARTIME, but the list is

  parcel data, but not imported because a) I'm not sure what I think is
  right, and b) I'm not sure what community consensus is.

  merging updates to massgis highway data, but this is hard

  importing NHD or masgis hydro

  importing more massgis rails/trails/etc.

  importing the towns w/o highway data, but there's a lot of manual
  merging (e.g. gloucester).  This leads to thoughts of writing code to
  auto-merge, which leads to it not happening due to not enough time.

 4) In the US, what (if any) role should OSM US play in imports?

Perhaps helping with the above, and being elder statesmen about advice.


So all in all, my level of restraint, but a higher level of spare time,
is probably where we want people to be.  One thought is that someone
wanting to import should probably have done some manual mapping first,
to get their head around the norms and community.


pgpYwsak5zSBe.pgp
Description: PGP signature
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-06 Thread Alan Millar
As I manually survey various features (POIs, some hydro, etc.), I  
usually try to merge in the data from existing imports so as to  
maintain the link (e.g. gnis:feature_id) back to the original  
database, in case we want to exchange updates with them again.


this is impossible due to the license terms,


That may be the short quick answer, but it is not the long answer.   
The link will be valuable as we figure out other ways to synchronize  
the data and/or make dual-license updates; either originated from OSM  
or from the other party like USGS.  Simple? No.  Impossible?  No.


- Alan



___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Frederik Ramm

Hi,

Richard Weait wrote:

Required reading:

http://www.asklater.com/matt/wordpress/2009/09/imports-and-the-community/
http://www.asklater.com/matt/wordpress/2009/09/imports-and-the-community-ii/


I also like The Pottery Club:

http://www.gravitystorm.co.uk/shine/archives/2009/11/10/the-pottery-club/

Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Nathan Edgars II
One thing I'm wondering about: how useful is a small piece of a future
larger import? For example, there's the National Hydrography Dataset,
import of which is apparently being coordinated on the wiki. I've
imported individual lakes and swamps from it, as well as all of those
in small areas (such as Disney World). Obviously this is a good thing
if I'm working on the area. But does it help at all for a future
larger import, or is it just more 'noise' like lakes drawn from
aerials?

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Ian Dees
On Thu, Aug 5, 2010 at 1:27 PM, Nathan Edgars II nerou...@gmail.com wrote:

 One thing I'm wondering about: how useful is a small piece of a future
 larger import? For example, there's the National Hydrography Dataset,
 import of which is apparently being coordinated on the wiki. I've
 imported individual lakes and swamps from it, as well as all of those
 in small areas (such as Disney World). Obviously this is a good thing
 if I'm working on the area. But does it help at all for a future
 larger import, or is it just more 'noise' like lakes drawn from
 aerials?


I think the NHD import is a good example of a well-intentioned importer
(me) gone wrong. I had initially planned to import the whole darn thing in
one swoop, but various technical and life challenges came up before I could
get it going. While I was working on those issues, people started importing
it themselves (sometimes marking so on the wiki, sometimes not). Now that
there are some areas imported, the import of the whole dataset becomes
infinitely harder because we have to match existing data with new OSM-ified
data.

What I'm trying to say is that once a small part of an import happens, the
larger import probably doesn't make sense to do.
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Nathan Edgars II
On Thu, Aug 5, 2010 at 2:38 PM, Ian Dees ian.d...@gmail.com wrote:
 I think the NHD import is a good example of a well-intentioned importer
 (me) gone wrong. I had initially planned to import the whole darn thing in
 one swoop, but various technical and life challenges came up before I could
 get it going. While I was working on those issues, people started importing
 it themselves (sometimes marking so on the wiki, sometimes not). Now that
 there are some areas imported, the import of the whole dataset becomes
 infinitely harder because we have to match existing data with new OSM-ified
 data.

But how is this any different from importing it into an area where
people have already mapped some lakes from aerials?

Personally I think the best US example of a bad import is the
environmental hazards.

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Ian Dees
On Thu, Aug 5, 2010 at 1:47 PM, Nathan Edgars II nerou...@gmail.com wrote:

 On Thu, Aug 5, 2010 at 2:38 PM, Ian Dees ian.d...@gmail.com wrote:
  I think the NHD import is a good example of a well-intentioned importer
  (me) gone wrong. I had initially planned to import the whole darn thing
 in
  one swoop, but various technical and life challenges came up before I
 could
  get it going. While I was working on those issues, people started
 importing
  it themselves (sometimes marking so on the wiki, sometimes not). Now that
  there are some areas imported, the import of the whole dataset becomes
  infinitely harder because we have to match existing data with new
 OSM-ified
  data.

 But how is this any different from importing it into an area where
 people have already mapped some lakes from aerials?


It isn't any different. I had made the (bad) decision at the time to import
over any existing data because in the several hundred places I spot-checked,
NHD was vastly superior in resolution (and probably quality).

If we wanted to support imports as a community, a tool like [0] should be
the only way of letting imports in to OSM.

[0]
http://wiki.openstreetmap.org/wiki/OSM_Import_Database#French_Corine_Import_as_Template
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Ian Dees
On Sat, Jan 8, 2000 at 3:20 PM, Katie Filbert filbe...@gmail.com wrote:

 The difference with NHD is that we are leaving conversion to osm format for
 the local mapper / importer.  Since OSM US has server space, maybe that's
 good use of it to host converted data ready for import.


I like this... the NHD status page on the wiki sort of already does this in
a backwards way. Perhaps I will look in to writing a web tool to keep track
of the import and give easy access to the pre-generated OSM files for
subbasins.
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Richard Weait
On Sat, Jan 8, 2000 at 4:20 PM, Katie Filbert filbe...@gmail.com wrote:
 Bad imports are bad for the osm.  High quality data carefully imported is
 helpful.  If such high quality data is available for us that is as good or
 better than what we can do ourselves, then it's fine not to reinvent the
 wheel. Where it's lower quality data than what we can do ourselves, then
 let's not use it.
[ ... ]
 Leaving imports to local mappers is good.  They are best able to assess the
 quality of the data for that area an care about quality of their local map
 data.   It also leaves low hanging fruit for them. Some areas without
 local mappers may take longer to finish. That is okay.

I have no arguments with this.

Consider this: Does importing to an area where there is no thriving
OSM community inhibit the creation of that thriving community in
future?

At SotM, one of our friends suggested that imports are, okay except
road networks.  Never import road networks.  The suggestion is that
building the road network also builds the community.  An existing road
network inhibits the community.  I apologize for not attributing that
comment.  I've forgotten who said it to me.

Or from another point of view.  If the local community isn't
substantial enough to maintain the imported data and keep it up to
date, is it better to not import until the community can maintain it?
Why import 2004 data, if it will be unchanged when the 2006 update is
published?  Does that mean that you should only import once you have
such a thriving community and high quality local data that you no
longer would benefit substantially from that import?

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread David Carmean
On Thu, Aug 05, 2010 at 01:38:36PM -0500, Ian Dees wrote:


 I think the NHD import is a good example of a well-intentioned importer
 (me) gone wrong. I had initially planned to import the whole darn thing in
 one swoop, but various technical and life challenges came up before I could
 get it going. While I was working on those issues, people started importing
 it themselves (sometimes marking so on the wiki, sometimes not). Now that
 there are some areas imported, the import of the whole dataset becomes
 infinitely harder because we have to match existing data with new OSM-ified
 data.

And I'll add my own mea-culpa.  I created some wiki pages/features to
help partition and coordinate NHD import efforts, and then also found
that I didn't have the time to follow up.

I would agree that the partial imports will have increased the difficulty
of a large-scale bulk import, but we already had hydrographic features
from TIGER, did we not.  And hand-drawn features from aerial traces and
actual boots-on-the-ground mapping.  Conflation in general is a tough
problem, I gather.  There are tools, algorithms and heuristics in the
GIS world but the OSM data model makes translation between the two models
somewhat difficult.

For example, something that looks very interesting which I plan to examine
is the Java Conflation Suite [1], which looks like it could be used over
relatively small areas (probably about the size of the API limit... 0.25
degrees square?).  But as a component of the JUMP[2] platform, it operates
only on Shapefiles and GML out of the box.  (If we could get some Java
expertise I think it would be very worthwhile working with the JUMP team
to create an OSM driver.)

At any rate, while I think we could mitigate a number of problems 
given some development effort, I also agree that we might want to 
spend more time thinking about why we want to make the imports--and 
perhaps publically debate, if only in talking to yourself on the 
project wiki page, the pros and cons of a particular import.



[1] http://www.vividsolutions.com/JCS/
[2] http://www.vividsolutions.com/jump/



___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Frederik Ramm

Katie,

   your computer thinks it is the year 2000. I see you sent that from 
your iPhone. Maybe you had your fingers on the wrong spot so it didn't 
get a time signal.


Katie Filbert wrote:
Bad imports are bad for the osm.  High quality data carefully imported 
is helpful. 


Not unconditionally.

For example, high quality data carefully exported which is a copy of 
someone else's, and which is maintained professionally at the source, 
may not be helpful (because while we import it as high quality, the 
quality vis-a-vis the original source will deteriorate over time, with 
the original source issuing updates that we cannot import easily).


Also, high quality data carefully imported which depicts things we 
cannot possibly edit - example: official airspace boundaries - is not 
helpful, since we are not a collector of data, but a data maintenance 
machine - anything static that cannot be modified by our mappers will 
always remain a foreign object.


Leaving imports to local mappers is good.  They are best able to assess 
the quality of the data for that area an care about quality of their 
local map data. It also leaves low hanging fruit for them. Some 
areas without local mappers may take longer to finish. That is okay.


+1 to that.

Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread David Carmean
On Thu, Aug 05, 2010 at 10:38:47PM +0200, Frederik Ramm wrote:
 Katie,
 
 your computer thinks it is the year 2000. I see you sent that from 
 your iPhone. Maybe you had your fingers on the wrong spot so it didn't 
 get a time signal.

Not only that, all of your messages (katie) are being trapped as spam 
by my provider's system, probably because of the bad date.


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Alan Mintz

At 2010-08-05 11:52, Ian Dees wrote:

...
It isn't any different. I had made the (bad) decision at the time to 
import over any existing data because in the several hundred places I 
spot-checked, NHD was vastly superior in resolution (and probably quality).


By import over, do you mean to add duplicates, replace the existing 
features, or merge the info from the two manually?


As I manually survey various features (POIs, some hydro, etc.), I usually 
try to merge in the data from existing imports so as to maintain the link 
(e.g. gnis:feature_id) back to the original database, in case we want to 
exchange updates with them again.


One thing that occurs to me that may be a problem is that I occasionally 
have to delete a feature that is no longer present (e.g. 
http://www.openstreetmap.org/browse/node/358808220). If we were to feed an 
update back to GNIS or get one from them, this situation would have to be 
taken into account.


--
Alan Mintz alan_mintz+...@earthlink.net


___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Ian Dees
On Thu, Aug 5, 2010 at 4:43 PM, Alan Mintz
alan_mintz+...@earthlink.netalan_mintz%2b...@earthlink.net
 wrote:

 At 2010-08-05 11:52, Ian Dees wrote:

 ...
 It isn't any different. I had made the (bad) decision at the time to
 import over any existing data because in the several hundred places I
 spot-checked, NHD was vastly superior in resolution (and probably quality).


 By import over, do you mean to add duplicates, replace the existing
 features, or merge the info from the two manually?


Add duplicates.


 As I manually survey various features (POIs, some hydro, etc.), I usually
 try to merge in the data from existing imports so as to maintain the link
 (e.g. gnis:feature_id) back to the original database, in case we want to
 exchange updates with them again.

 One thing that occurs to me that may be a problem is that I occasionally
 have to delete a feature that is no longer present (e.g.
 http://www.openstreetmap.org/browse/node/358808220). If we were to feed an
 update back to GNIS or get one from them, this situation would have to be
 taken into account.


When I made the original GNIS import I saved the resulting XML and IDs
(which would have allowed us to detect deletions) but promptly lost it in a
hard drive crash.
___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread James U
I have to say that after importing a large amount of NHD data (most of NC 
and MN) that it is of varying quality, as was the preexisting water related 
data already on the server.  In general, I agree with Ian that it is higher 
quality (both resolution and accuracy) than the preexisting data that 
largely consisted of quickly drawn Yahoo traces.  I saw very little evidence 
of on the ground surveying of these features and don't think the import 
will hinder most people from participating in OSM writ large.

I have a fair bit of experience in converting data and would be happy to 
convert subbasins (these appear to be rougly 2500 square mile areas 
and are documented on the wiki) for people if they want to go through 
the process of double checking to make sure the data don't conflict with 
or overlap already existing data.

James

On Thursday, August 05, 2010 04:29:19 pm David Carmean wrote:
 On Thu, Aug 05, 2010 at 01:38:36PM -0500, Ian Dees wrote:
  I think the NHD import is a good example of a well-intentioned 
importer
  (me) gone wrong. I had initially planned to import the whole darn 
thing
  in one swoop, but various technical and life challenges came up 
before I
  could get it going. While I was working on those issues, people 
started
  importing it themselves (sometimes marking so on the wiki, 
sometimes
  not). Now that there are some areas imported, the import of the 
whole
  dataset becomes infinitely harder because we have to match existing 
data
  with new OSM-ified data.
 
 And I'll add my own mea-culpa.  I created some wiki pages/features to
 help partition and coordinate NHD import efforts, and then also found
 that I didn't have the time to follow up.
 
 I would agree that the partial imports will have increased the difficulty
 of a large-scale bulk import, but we already had hydrographic features
 from TIGER, did we not.  And hand-drawn features from aerial traces 
and
 actual boots-on-the-ground mapping.  Conflation in general is a tough
 problem, I gather.  There are tools, algorithms and heuristics in the
 GIS world but the OSM data model makes translation between the two 
models
 somewhat difficult.
 
 For example, something that looks very interesting which I plan to 
examine
 is the Java Conflation Suite [1], which looks like it could be used over
 relatively small areas (probably about the size of the API limit... 0.25
 degrees square?).  But as a component of the JUMP[2] platform, it 
operates
 only on Shapefiles and GML out of the box.  (If we could get some Java
 expertise I think it would be very worthwhile working with the JUMP team
 to create an OSM driver.)
 
 At any rate, while I think we could mitigate a number of problems
 given some development effort, I also agree that we might want to
 spend more time thinking about why we want to make the imports--and
 perhaps publically debate, if only in talking to yourself on the
 project wiki page, the pros and cons of a particular import.
 
 
 
 [1] http://www.vividsolutions.com/JCS/
 [2] http://www.vividsolutions.com/jump/
 
 
 
 ___
 Talk-us mailing list
 Talk-us@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/talk-us

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Nathan Edgars II
On Thu, Aug 5, 2010 at 6:10 PM, James U jumba...@gmail.com wrote:
 I have to say that after importing a large amount of NHD data (most of NC
 and MN) that it is of varying quality, as was the preexisting water related
 data already on the server.  In general, I agree with Ian that it is higher
 quality (both resolution and accuracy) than the preexisting data that
 largely consisted of quickly drawn Yahoo traces.  I saw very little evidence
 of on the ground surveying of these features and don't think the import
 will hinder most people from participating in OSM writ large.

On the other hand, my (extremely limited) experience is that the
aerial water traces for Disney World were superior to the NHD import
(so I quickly deleted all the dupes from NHD). But the swamps were a
lot more useful, since you can't really tell if something's swampy
without physically going there. I love how all these islands
suddenly made sense:
http://www.openstreetmap.org/?lat=28.29lon=-81.5191zoom=14layers=M

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread Kevin Atkinson


Some guides aimed at focused scripts which address a particular problem in 
a well defined area would be useful, as most of the guide is aimed at 
automatic fixup bots and large scale imports.  For example a note in big 
bold letters that large uploads take a long time will be very helpful. 
Also a guide to using JOSM advanced chunk upload feature will be very 
helpful.


Also, I think what the bot does is very important, tag fixups are 
generally a lot safer than bots which affect nodes, and those are safer 
than those that remove ways.



___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us


Re: [Talk-us] A Friendly Guide to 'Bots and Imports

2010-08-05 Thread andrzej zaborowski
Hi,

On 5 August 2010 21:46, Richard Weait rich...@weait.com wrote:
 On Sat, Jan 8, 2000 at 4:20 PM, Katie Filbert filbe...@gmail.com wrote:
 Leaving imports to local mappers is good.  They are best able to assess the
 quality of the data for that area an care about quality of their local map
 data.   It also leaves low hanging fruit for them. Some areas without
 local mappers may take longer to finish. That is okay.

Definitely there are advantages from the import being done by a local,
but, as always, there are also advantages from the import being done
by the author of conversion script, someone who understands exactly
what parts need to be checked manually and someone who has done many
such imports instead of only a limited area.  (I have taken part in an
import where I made converted data available on the web for locals to
import and often had to spend longer fixing stuff after them than it
would have taken me to do it myself).

So it's hard to stand on one side or the other, probably best to look
at it case by case.


 I have no arguments with this.

 Consider this: Does importing to an area where there is no thriving
 OSM community inhibit the creation of that thriving community in
 future?

 At SotM, one of our friends suggested that imports are, okay except
 road networks.  Never import road networks.  The suggestion is that
 building the road network also builds the community.  An existing road
 network inhibits the community.  I apologize for not attributing that
 comment.  I've forgotten who said it to me.

 Or from another point of view.  If the local community isn't
 substantial enough to maintain the imported data and keep it up to
 date, is it better to not import until the community can maintain it?
 Why import 2004 data, if it will be unchanged when the 2006 update is
 published?  Does that mean that you should only import once you have
 such a thriving community and high quality local data that you no
 longer would benefit substantially from that import?

I totally agree here, it's a bit of a trade-off choosing the right
moment.  If you do it too soon, you get an unmaintained map of the
area.  If you do it too late, local mappers who didn't know about the
datasource contribute their time to re-collect the data, which later
clashes with the datasource and costs time to choose the better
version, to merge, and it is frustrating when someone finds out they
could have spent the time on the finer details.

Cheers

___
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us