So, the solution is to just provide a patch with more cases for escaping in http://trac.openstreetmap.org/browser/applications/utils/osmosis/src/com/bretth/osmosis/core/xml/common/ProductionDbDataDecoder.java http://trac.openstreetmap.org/browser/applications/utils/osmosis/src/com/bretth/osmosis/core/xml/common/ProductionDbDataEncoder.java and hope they work fine?
It would of course be better in a long run to fix the main DB, but I'm not sure what all this brings along. Probably a lot. Stefan On Dec 19, 2007 10:36 PM, Brett Henderson <[EMAIL PROTECTED]> wrote: > Hi All, > > I've lost my home ADSL (won't line sync, tried two modems, tried different > leads, doesn't seem to be my end) so I'm mostly offline. As a result I'm > unlikely to get onto this issue in the short term. With Christmas > approaching I'm bracing myself for a long'ish outage. > > If anybody wishes to take a look, the hacked character encoding class is > named ProductionDbCharset and has two related classes named > ProductionDbDataEncoder and ProductionDbDataDecoder. > > The classes are instantiated within BaseXmlWriter which is extended by the > XmlWriter class for writing osm files and XmlChangeWriter for osc files. > The hack works by just passing the doubly encoded data through the osmosis > pipeline then fixing it before writing to xml. > > Not sure how easy it will be to fix without access to a doubly encoded > database though. > > Brett > > > > On 12/20/07, Martijn van Oosterhout < [EMAIL PROTECTED]> wrote: > > > > > > > > On Dec 18, 2007 1:04 PM, Stefan Baebler < [EMAIL PROTECTED]> wrote: > > > I somehow assumed utf8 would be the default choice by now. Also > > > http://wiki.openstreetmap.org/index.php/Database_schema > > > mentions utf8 explicitly for every table individually. > > > > > > Why does main api work nicely then? > > > Why are full planet dumps ok? > > > > There's an encoding issue in that what the ruby server thinks it is is > > different from what the datavase encoding actually is. The net result > > is that the data is encoded *twice*. For example (not actual codes, > > just examples): > > > > Original char: character 0xef > > Encoded as: 0xc3 0xaf > > Stored as: 0xc0 0xc3 0xc0 0xbf > > > > > And more importantly: > > > How can same magic be used to get properly utf8 encoded hourly changes > (.osc)? > > > > Osmosis is in Java which is smart enough to not let you do stupid > > thing like getting the database connection encoding wrong. It's just a > > question of fixing the de-double-encoding-hack in osmosis. It doesn't > > help that it's a *windows* encoding in the first step. > > > > Have a nice day, > > -- > > Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > > > > _______________________________________________ > > > > dev mailing list > > dev@openstreetmap.org > > http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev > > > > > _______________________________________________ > dev mailing list > dev@openstreetmap.org > http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev > > _______________________________________________ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev