[OSM-talk] Osmosis UTF-8 problem (again)

2008-01-11 Thread Martijn van Oosterhout
Looks like there's an issue with  UTF-8 characters in the username.

Line 42117 of daily-20080109-20080110.osc is an example (node 32268361).

Have a nice day,
-- 
Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/

___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk


Re: [OSM-talk] Osmosis UTF-8 problem (again)

2008-01-11 Thread Tom Hughes
On 11/01/2008, Brett Henderson [EMAIL PROTECTED] wrote:

 Martijn van Oosterhout wrote:
  Looks like there's an issue with  UTF-8 characters in the username.
 
  Line 42117 of daily-20080109-20080110.osc is an example (node 32268361).
 
  Have a nice day,

 Any idea what the user name should be? I find it hard to believe that
 user=jos逴巜¯(R)退 (from the API) is correct.

The name doesn't make any more sense in a mysql command line, so I
don't think it's an osmosis problem.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/
___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk


Re: [OSM-talk] Osmosis UTF-8 problem (again)

2008-01-11 Thread Frederik Ramm
Hi,

 Any idea what the user name should be? I find it hard to believe that 
 user=jos??¯® (from the API) is correct.

Well on 05 December I did have a problem with the planet diff, quoting
from old E-Mail:



   latest daily planet diff has an UTF-8 problem on line 58267:
node id=25254929 timestamp=2007-12-04T17:26:52Z user=josé ...
Seems like the user names don't get encoded properly.



Username looks conspicuously similar ;)

Bye
Frederik

-- 
Frederik Ramm  ##  eMail [EMAIL PROTECTED]  ##  N49°00.09' E008°23.33'


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk


Re: [OSM-talk] Osmosis UTF-8 problem (again)

2008-01-11 Thread Brett Henderson
Frederik Ramm wrote:
 Hi,

   
 Any idea what the user name should be? I find it hard to believe that 
 user=jos??¯® (from the API) is correct.
 

 Well on 05 December I did have a problem with the planet diff, quoting
 from old E-Mail:

   

latest daily planet diff has an UTF-8 problem on line 58267:
 node id=25254929 timestamp=2007-12-04T17:26:52Z user=josé ...
 Seems like the user names don't get encoded properly.

 

 Username looks conspicuously similar ;)
   
I remember that email, I was hoping the problem would magically 
disappear ;-)

Checking the history of that node from the API again gives user=jos逴巊 
»H´ (hopefully this is coming through okay, it includes a bunch of 
Chinese-like characters).

I'll check it out in more detail soon. It does look like it should be 
user=josé but given that the API is also returning interesting data 
it sounds like there's a deeper problem somewhere. Either way, osmosis 
shouldn't be emitting invalid UTF-8, but fixing it may not be easy. It 
might have something to do with characters that can't be represented 
with 16-bit characters. If it does turn out to be a problem elsewhere I 
can try to put a hack in place to at least emit valid UTF-8, but it will 
require me doing some more reading of unicode standards which I'm not 
excited about :-)


___
talk mailing list
talk@openstreetmap.org
http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk