Re: [OSM-talk] Possible UTF-8 encoding errors in changesets

2009-04-05 Thread Brett Henderson
Matt Amos wrote:
 On Sat, Apr 4, 2009 at 8:01 PM, Florian Lohoff f...@rfc822.org wrote:
   
 Will this problem
 be fixed with API 0.6? Its quite annoying to fix this again and again
 manually.
 

 yep. the problem with bad UTF-8 in the database will be solved, both
 in the server code and in the database tables.
   
It's probably too late for most people but the changeset files have now 
been fixed.

As someone who has spent many hours dealing with the current database 
utf-8 issues I'll be ecstatic when 0.6 goes live :-)

Brett


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Possible UTF-8 encoding errors in changesets

2009-04-05 Thread Matt Amos
On Sun, Apr 5, 2009 at 2:03 PM, Brett Henderson br...@bretth.com wrote:
 As someone who has spent many hours dealing with the current database utf-8
 issues I'll be ecstatic when 0.6 goes live :-)

just a little tip for anyone else who is really annoyed by broken
UTF-8 - i run all the change files through utf8sanitizer before trying
to apply them, or even parse them. since i started doing that, i
haven't had any problem with broken utf8.

cheers,

matt

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


[OSM-talk] Possible UTF-8 encoding errors in changesets

2009-04-04 Thread Jeremy Adams
Hey everyone,

Osmosis is failing on my ROMA server with what looks to me like some kind of 
UTF-8 encoding error.  It has stopped at the 200904040742-20090404043.osc.gz 
file.  I've also tried processing the hourly changeset for that hour and got 
the same error.

The full output from Osmosis is in the attached txt file.
osmosis --rxc file=200904040742-200904040743.osc.gz --wpc database=xxx user=xxx 
password=xxx
Apr 4, 2009 9:40:58 AM com.bretth.osmosis.core.Osmosis run
INFO: Osmosis Version 0.29.4
Apr 4, 2009 9:40:59 AM com.bretth.osmosis.core.Osmosis run
INFO: Preparing pipeline.
Apr 4, 2009 9:40:59 AM com.bretth.osmosis.core.Osmosis run
INFO: Launching pipeline execution.
Apr 4, 2009 9:40:59 AM com.bretth.osmosis.core.Osmosis run
INFO: Pipeline executing, waiting for completion.
Apr 4, 2009 9:41:02 AM 
com.bretth.osmosis.core.pipeline.common.ActiveTaskManager waitForCompletion
SEVERE: Thread for task 1-rxc failed
com.bretth.osmosis.core.OsmosisRuntimeException: Unable to read XML file 
200904040742-200904040743.osc.gz.
at 
com.bretth.osmosis.core.xml.v0_5.XmlChangeReader.run(XmlChangeReader.java:123)
at java.lang.Thread.run(Thread.java:636)
Caused by: 
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 
Invalid byte 2 of 2-byte UTF-8 sequence.
at 
com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:687)
at 
com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:372)
at 
com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1719)
at 
com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:1041)
at 
com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:950)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1515)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1292)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2723)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:624)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:486)
at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:810)
at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:740)
at 
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:110)
at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1208)
at 
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:525)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:392)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
at 
com.bretth.osmosis.core.xml.v0_5.XmlChangeReader.run(XmlChangeReader.java:108)
... 1 more
Apr 4, 2009 9:41:02 AM com.bretth.osmosis.core.Osmosis main
SEVERE: Execution aborted.
com.bretth.osmosis.core.OsmosisRuntimeException: One or more tasks failed.
at 
com.bretth.osmosis.core.pipeline.common.Pipeline.waitForCompletion(Pipeline.java:146)
at com.bretth.osmosis.core.Osmosis.run(Osmosis.java:81)
at com.bretth.osmosis.core.Osmosis.main(Osmosis.java:30)

-
___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Possible UTF-8 encoding errors in changesets

2009-04-04 Thread Florian Lohoff
On Sat, Apr 04, 2009 at 09:42:55AM -0400, Jeremy Adams wrote:
 Hey everyone,
 
 Osmosis is failing on my ROMA server with what looks to me like some kind of 
 UTF-8 encoding error.  It has stopped at the 200904040742-20090404043.osc.gz 
 file.  I've also tried processing the hourly changeset for that hour and got 
 the same error.
 
 The full output from Osmosis is in the attached txt file.

200904040742-200904040743.osc:1556: parser error : Input is not proper UTF-8, 
indicate encoding !
Bytes: 0xC3 0x22 0x2F 0x3E
Orchideenwiesen. Der Rundweg startet zwischen den Ortsteilen Wommelshausen und H

I manually edited the file and stripped of the last 2 characters
and applied it manually ...

The next files

200904040743-200904040744
200904040744-200904040745
200904040745-200904040746

200904040749-200904040750

have the very same object with the same error ... Will this problem
be fixed with API 0.6? Its quite annoying to fix this again and again
manually.

Flo
-- 
Florian Lohoff  f...@rfc822.org +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature
___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk