Welcome to the Spanish part of the world! Things will even get nicer when you get to Russia or Greece ;-)
My guess is that the original file format was not UTF-8, you may have converted it, but in the conversion process, the new file decided to replace the ê ç and other characters. I had the same problem with major cities of the world and ended up getting everything into postGIS that was Forced to UTF-8 and then look up the crazy characters and replace them by hand... sigh... So I hope you run into a real solution, cause I would like to know too! Dave Hansen schreef: > Well, I've done virtually the entire US's TIGER data with the script, > with no issues, but it finally choked on Puerto Rico. > > It gets this: > > not well-formed (invalid token) at line 330, column 38, byte 14569 > at /usr/local/lib/perl/5.8.8/XML/Parser.pm line 187 > > when running on this file: > > http://dev.openstreetmap.org/~daveh/tiger.files/counties/PR/Adjuntas.osm > > I think it's the crazy characters in tags like this: > > <tag k="name" v="Carr Sillo de Calder�n"/> > <tag k="tiger:name_base" v="Carr Sillo de Calder�n"/> > > Being a stupid American, I have no real knowledge of character sets and > that fun. Any idea what the right way to fix this is? > > -- Dave > > > > > > _______________________________________________ > dev mailing list > [email protected] > http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev > _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev

