Now I have some breakfast in me, to be clear it appears that UTF-8  
byte stream is being interpreted as Latin1 and then converted to  
unicode...

Marc
On 16/03/2009, at 6:25 AM, Marc Boschma wrote:

> excuse the typo:
> On 16/03/2009, at 6:23 AM, Marc Boschma wrote:
>
>> Just looking at http://jeppesn.dk/utf-8.html , I found the  
>> following lines:
>> Character    Latin1  Unicode         UTF-8   Latin1
>>                      code                                            interpr.
>> ç                    E7              00 E7           C3 A7   ç
>> Ã is C38C, § is C2 A7
> Ã is C383
>> So it appears that somewhere there is a translation to Latin 1  
>> going on.
>> Hopefully that helps some what...
>> Regards,
>> Marc
>>
>> On 16/03/2009, at 1:08 AM, Derek Chen-Becker wrote:
>>
>>> This is really interesting. I've narrowed it down to something on  
>>> form submission. The database shows gibberish, too, and if I  
>>> manually enter the correct value in the DB it works fine on  
>>> display. If I print the UTF-8 byte values of the string I get from  
>>> the browser for my description when I submit a cedilla (ç), I see:
>>>
>>> INFO - Submitted desc bytes = c3 83 c2 a7
>>>
>>> A cedilla is c3 a7 in UTF-8, so I'm not sure where the "83 c2" is  
>>> coming from. I googled around a bit and I found other people  
>>> having the same issue but it wasn't clear in those posts what the  
>>> cause was. I did a packet capture just as a sanity check, and  
>>> here's what I got:
>>>
>>> POST / HTTP/1.1
>>> ... headers here ...
>>>
>>> F956759623045OFT 
>>> = 
>>> true 
>>> &F956759623046BU5 
>>> =1&F9567596230472LR=2009%2F03%2F18&F956759623048IZR= 
>>> %C3%A7&F956759623049S3E=3&F956759623050E25=test
>>>
>>> As you can see, the (url encoded) value of the F956759623048IZR  
>>> field (description) is %C3%A7, so something isn't properly  
>>> converting that. Helpers.urlDecode seems to be working properly:
>>>
>>> scala> Helpers.urlDecode("F956759623048IZR=%C3%A7")
>>> res1: java.lang.String = F956759623048IZR=ç
>>>
>>> So I have no idea where this is coming from. All I know is that  
>>> between the actual POST and when my submit function is called,  
>>> something is tweaking the string. I'm going to dig some more, but  
>>> I wanted to post this in case it triggers any thoughts out there.
>>>
>>> Derek
>>>
>>> PS - I just found this:
>>>
>>> http://mail-archives.apache.org/mod_mbox/struts-dev/200604.mbox/%3c3769847.1145910729808.javamail.j...@brutus%3e
>>>
>>> May be related?
>>>
>>> On Sun, Mar 15, 2009 at 7:26 AM, Derek Chen-Becker <dchenbec...@gmail.com 
>>> > wrote:
>>> OK, I can replicate this in our PocketChange app (also going  
>>> against a PostgreSQL DB). Let me dig a bit.
>>>
>>> Derek
>>>
>>>
>>> On Sun, Mar 15, 2009 at 3:58 AM, Charles F. Munat <c...@munat.com>  
>>> wrote:
>>>
>>> This might help, but I don't think I was clear. I have an online  
>>> form.
>>> My clients enter text into it. Their text has characters like a c  
>>> with a
>>> cedilla. That text gets saved into a PostgreSQL database (UTF-8)  
>>> varchar
>>> field via JPA/Hibernate.
>>>
>>> Then I pull it back out and dump it into a template, and it comes  
>>> out
>>> gibberish. If I try using &ccedil; instead, I get &amp;cedil; back  
>>> out.
>>>
>>> Here is what I have:
>>>
>>> "name" -> SHtml.text(thing.name, thing.name = _, ("size", "40"))
>>>
>>> If I enter "cachaça" in the field, I get cachaça back out. The  
>>> weird
>>> thing is that sometimes when I copy and paste text from another  
>>> document
>>> into the form, it works. But if I use the keyboard, it fails every  
>>> time.
>>>
>>> I'll play around with this. Thanks.
>>>
>>> Chas.
>>>
>>> Derek Chen-Becker wrote:
>>> > Oops, forgot scala.xml.Unparsed, too:
>>> >
>>> > scala> val m = <span>a{ scala.xml.Unparsed("&ccedil;") }b</span>
>>> > m: scala.xml.Elem = <span>a&ccedil;b</span>
>>> >
>>> > That one might be what you're looking for.
>>> >
>>> > Derek
>>> >
>>> > On Sat, Mar 14, 2009 at 9:57 PM, Derek Chen-Becker
>>> > <dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>> wrote:
>>> >
>>> >     I think it depends on how you're embedding them in the XML:
>>> >
>>> >     scala> val m = <span>a&ccedil;b</span>
>>> >     m: scala.xml.Elem = <span>a&ccedil;b</span>
>>> >
>>> >     scala> val m = <span>a{"&ccedil;"}b</span>
>>> >     m: scala.xml.Elem = <span>a&amp;ccedil;b</span>
>>> >
>>> >     scala> val m = <span>a{"ç"}b</span>
>>> >     m: scala.xml.Elem = <span>açb</span>
>>> >
>>> >     That last one was input using dead keys (alt+,) on my linux  
>>> (USA
>>> >     International with dead keys) layout. Let me know if this  
>>> doesn't
>>> >     help; if not, could you send the code/template that's having  
>>> issues?
>>> >
>>> >     Derek
>>> >
>>> >
>>> >     On Sat, Mar 14, 2009 at 6:36 PM, Charles F. Munat <c...@munat.com
>>> >     <mailto:c...@munat.com>> wrote:
>>> >
>>> >
>>> >         I have a site that uses a lot of "special" characters (a  
>>> remarkably
>>> >         biased description, since there is nothing "special"  
>>> about accented
>>> >         characters to the people who use them daily). In  
>>> particular, I
>>> >         need the
>>> >         c with cedilla and the n with the tilde.
>>> >
>>> >         These characters are being input to a database (UTF-8)  
>>> via an online
>>> >         form, then spit back out onto the page.
>>> >
>>> >         It's a fucking disaster. Apparently, everything goes  
>>> through the xml
>>> >         parser, which is great, except when I try to enter these  
>>> as entity
>>> >         references, such as &ccedil;, the parser changes & to  
>>> &amp; and
>>> >         I get
>>> >         the literal &ccedil; back out again.
>>> >
>>> >         When I type ç using the keyboard (or copy and paste it  
>>> from a
>>> >         page or a
>>> >         text editor), I get gibberish.
>>> >
>>> >         Anyone know the trick to getting around this? I need  
>>> everything
>>> >         from e
>>> >         acute to e grave to trademark and registered trademark  
>>> symbols,
>>> >         and I
>>> >         need to enter them this way.
>>> >
>>> >         Thanks for any help. If I can get this to work, I'll add  
>>> an
>>> >         explanation
>>> >         to the wiki.
>>> >
>>> >         Chas.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > >
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Lift" group.
To post to this group, send email to liftweb@googlegroups.com
To unsubscribe from this group, send email to 
liftweb+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/liftweb?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to