Oh, sorry, Derek. My bad. I didn't mean to imply that you were saying 
that the situation was optimal. I understood where you were coming from. 
  Actually, I wasn't really addressing your comment after my first 
sentence. I should have made that clear. Haven't had my coffee yet...

This is kind of important to me. I have a site that is sponsored by some 
big liquor companies. Many of them are European, and then the Brazilian 
ones are all selling cachaça. Eliminating accents and changing ç to c 
does not make them happy, which does not make my client happy. And I 
can't explain to them why I can't help it because their sites all work 
fine with ç. So I spent more than 40 hours this week, mostly between 
midnight and 6 AM, inputing data that my client could have input 
themselves because I didn't want them to have to deal with this problem. 
That was above and beyond the 40+ hours I spent programming.

Now I have to go back and change all those after we figure this out. So 
it's a pretty major issue for me at the moment.

I'm thinking that as a workaround, I can go change things directly in 
the database and see if that helps. Ugh. That's gonna mean another week 
of no sleep.

Can you point me to the spot in Lift code where this all happens? I'd 
love to be part of the solution instead of just the guy who points 
things out.

Chas.

Derek Chen-Becker wrote:
> Sorry, I'm not suggesting that this is the appropriate method for users; 
> they should just be able to type. I was just trying to explain why the 
> "&" is getting expanded. I think that the current behavior is not really 
> what anyone wants, and hopefully we can fix it in a transparent manner.
> 
> Derek
> 
> On Sun, Mar 15, 2009 at 2:38 PM, Charles F. Munat <c...@munat.com 
> <mailto:c...@munat.com>> wrote:
> 
> 
>     Unfortunately, there is no easy way to do that with user input. But the
>     use of character entity references is problematic in itself. I can't
>     teach all my site's users all the references they will need, nor is it
>     really reasonable to expect, for example, an international group of
>     users to have to hand code every accented character.
> 
>     There must be a way to input UTF-8 and have it come out properly. I've
>     set the keyboard on my Mac to U.S. Extended, which makes everything
>     UTF-8. I note that *most* of the keyboards available for the Mac are
>     UTF-8 (though the default U.S. keyboard is Roman, and there are many
>     European keyboards that are Roman or Cyrillic).
> 
>     Ideally, Lift would recognize the character encoding and act
>     appropriately. (I'd be happy to convert everything to UTF-8.) Another
>     possibility, much less preferred but at least workable, would be to add
>     the ability for the user to select the character encoding (they could
>     use trial and error if they weren't sure).
> 
>     But the upshot is that someone with a keyboard set to UTF-8 (which
>     includes much of the world) should be able to use that keyboard and have
>     it come out the same way it went in. I have no idea how to accomplish
>     this, however, as I don't know how that part of Lift works.
> 
>     Chas.
> 
>     Derek Chen-Becker wrote:
>      > The scala XML syntax automatically converts any "&" in embedded
>     strings
>      > to "&amp;". You have to put the string inside a
>     scala.xml.Unparsed node
>      > to prevent that from happening.
>      >
>      > Derek
>      >
>      > On Sun, Mar 15, 2009 at 1:59 PM, Charles F. Munat <c...@munat.com
>     <mailto:c...@munat.com>
>      > <mailto:c...@munat.com <mailto:c...@munat.com>>> wrote:
>      >
>      >
>      >     That was my thinking. It doesn't explain why &ccedil; in gets
>     changed to
>      >     &amp;ccedil;, but it explains why ç in becomes ç out. So I
>     think there
>      >     are two separate issues here.
>      >
>      >     The ç can be created in two different ways in UTF-8. One is
>     the single
>      >     "c with a cedilla" character. The second is a c character
>     followed by a
>      >     cedilla character. I am not sure how UTF-8 indicates that
>     these two
>      >     characters should be displayed as one. Neither am I sure that
>     this has
>      >     anything to do with the problem. Maybe it is simply that
>     something is
>      >     assuming Latin1 input even though the input is UTF-8.
>      >
>      >     It is definitely on the front end, because it is stored in
>     the database
>      >     as ç.
>      >
>      >     When I use &ccedil; instead, the problem is that it is *not*
>     converted
>      >     to ç as it goes into the database, and then on the way out
>     the XML
>      >     interpreter does not recognize it as a character entity
>     reference and so
>      >     converts the & to &amp;.
>      >
>      >     Chas.
>      >
>      >     Marc Boschma wrote:
>      >      > Now I have some breakfast in me, to be clear it appears that
>      >     UTF-8 byte
>      >      > stream is being interpreted as Latin1 and then converted to
>      >     unicode...
>      >      >
>      >      > Marc
>      >      > On 16/03/2009, at 6:25 AM, Marc Boschma wrote:
>      >      >
>      >      >> excuse the typo:
>      >      >> On 16/03/2009, at 6:23 AM, Marc Boschma wrote:
>      >      >>
>      >      >>> Just looking at http://jeppesn.dk/utf-8.html , I found the
>      >     following
>      >      >>> lines:
>      >      >>> Character   Latin1  Unicode         UTF-8   Latin1
>      >      >>>                     code
>      >          interpr.
>      >      >>> ç                   E7              00 E7           C3
>     A7   ç
>      >      >>> Ã is C38C, § is C2 A7
>      >      >> Ã is C383
>      >      >>> So it appears that somewhere there is a translation to
>     Latin 1
>      >     going on.
>      >      >>> Hopefully that helps some what...
>      >      >>> Regards,
>      >      >>> Marc
>      >      >>>
>      >      >>> On 16/03/2009, at 1:08 AM, Derek Chen-Becker wrote:
>      >      >>>
>      >      >>>> This is really interesting. I've narrowed it down to
>     something on
>      >      >>>> form submission. The database shows gibberish, too, and
>     if I
>      >      >>>> manually enter the correct value in the DB it works fine on
>      >     display.
>      >      >>>> If I print the UTF-8 byte values of the string I get
>     from the
>      >      >>>> browser for my description when I submit a cedilla (ç),
>     I see:
>      >      >>>>
>      >      >>>> INFO - Submitted desc bytes = c3 83 c2 a7
>      >      >>>>
>      >      >>>> A cedilla is c3 a7 in UTF-8, so I'm not sure where the
>     "83 c2" is
>      >      >>>> coming from. I googled around a bit and I found other
>     people
>      >     having
>      >      >>>> the same issue but it wasn't clear in those posts what
>     the cause
>      >      >>>> was. I did a packet capture just as a sanity check, and
>     here's
>      >     what
>      >      >>>> I got:
>      >      >>>>
>      >      >>>> POST / HTTP/1.1
>      >      >>>> ... headers here ...
>      >      >>>>
>      >      >>>>
>      >    
>     
> F956759623045OFT=true&F956759623046BU5=1&F9567596230472LR=2009%2F03%2F18&F956759623048IZR=%C3%A7&F956759623049S3E=3&F956759623050E25=test
>      >      >>>>
>      >      >>>> As you can see, the (url encoded) value of the
>     F956759623048IZR
>      >      >>>> field (description) is %C3%A7, so something isn't properly
>      >      >>>> converting that. Helpers.urlDecode seems to be working
>     properly:
>      >      >>>>
>      >      >>>> scala> Helpers.urlDecode("F956759623048IZR=%C3%A7")
>      >      >>>> res1: java.lang.String = F956759623048IZR=ç
>      >      >>>>
>      >      >>>> So I have no idea where this is coming from. All I know
>     is that
>      >      >>>> between the actual POST and when my submit function is
>     called,
>      >      >>>> something is tweaking the string. I'm going to dig some
>     more,
>      >     but I
>      >      >>>> wanted to post this in case it triggers any thoughts
>     out there.
>      >      >>>>
>      >      >>>> Derek
>      >      >>>>
>      >      >>>> PS - I just found this:
>      >      >>>>
>      >      >>>>
>      >    
>     
> http://mail-archives.apache.org/mod_mbox/struts-dev/200604.mbox/%3c3769847.1145910729808.javamail.j...@brutus%3e
>      >      >>>>
>      >      >>>> May be related?
>      >      >>>>
>      >      >>>> On Sun, Mar 15, 2009 at 7:26 AM, Derek Chen-Becker
>      >      >>>> <dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>
>     <mailto:dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>>
>      >     <mailto:dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>
>     <mailto:dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>>>> wrote:
>      >      >>>>
>      >      >>>>     OK, I can replicate this in our PocketChange app
>     (also going
>      >      >>>>     against a PostgreSQL DB). Let me dig a bit.
>      >      >>>>
>      >      >>>>     Derek
>      >      >>>>
>      >      >>>>
>      >      >>>>     On Sun, Mar 15, 2009 at 3:58 AM, Charles F. Munat
>      >      >>>>     <c...@munat.com <mailto:c...@munat.com>
>     <mailto:c...@munat.com <mailto:c...@munat.com>>
>      >     <mailto:c...@munat.com <mailto:c...@munat.com>
>     <mailto:c...@munat.com <mailto:c...@munat.com>>>> wrote:
>      >      >>>>
>      >      >>>>
>      >      >>>>         This might help, but I don't think I was clear.
>     I have an
>      >      >>>>         online form.
>      >      >>>>         My clients enter text into it. Their text has
>     characters
>      >      >>>>         like a c with a
>      >      >>>>         cedilla. That text gets saved into a PostgreSQL
>     database
>      >      >>>>         (UTF-8) varchar
>      >      >>>>         field via JPA/Hibernate.
>      >      >>>>
>      >      >>>>         Then I pull it back out and dump it into a
>     template,
>      >     and it
>      >      >>>>         comes out
>      >      >>>>         gibberish. If I try using &ccedil; instead, I get
>      >      >>>>         &amp;cedil; back out.
>      >      >>>>
>      >      >>>>         Here is what I have:
>      >      >>>>
>      >      >>>>         "name" -> SHtml.text(thing.name
>     <http://thing.name> <http://thing.name>
>      >     <http://thing.name>,
>      >      >>>>         thing.name <http://thing.name>
>     <http://thing.name> <http://thing.name> =
>      >     _, ("size", "40"))
>      >      >>>>
>      >      >>>>         If I enter "cachaça" in the field, I get
>     cachaça back
>      >     out.
>      >      >>>>         The weird
>      >      >>>>         thing is that sometimes when I copy and paste
>     text from
>      >      >>>>         another document
>      >      >>>>         into the form, it works. But if I use the
>     keyboard, it
>      >     fails
>      >      >>>>         every time.
>      >      >>>>
>      >      >>>>         I'll play around with this. Thanks.
>      >      >>>>
>      >      >>>>         Chas.
>      >      >>>>
>      >      >>>>         Derek Chen-Becker wrote:
>      >      >>>>         > Oops, forgot scala.xml.Unparsed, too:
>      >      >>>>         >
>      >      >>>>         > scala> val m = <span>a{
>     scala.xml.Unparsed("&ccedil;")
>      >      >>>>         }b</span>
>      >      >>>>         > m: scala.xml.Elem = <span>a&ccedil;b</span>
>      >      >>>>         >
>      >      >>>>         > That one might be what you're looking for.
>      >      >>>>         >
>      >      >>>>         > Derek
>      >      >>>>         >
>      >      >>>>         > On Sat, Mar 14, 2009 at 9:57 PM, Derek
>     Chen-Becker
>      >      >>>>         > <dchenbec...@gmail.com
>     <mailto:dchenbec...@gmail.com>
>      >     <mailto:dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>>
>     <mailto:dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>
>      >     <mailto:dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>>>
>      >      >>>>         <mailto:dchenbec...@gmail.com
>     <mailto:dchenbec...@gmail.com>
>      >     <mailto:dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>>
>      >      >>>>         <mailto:dchenbec...@gmail.com
>     <mailto:dchenbec...@gmail.com>
>      >     <mailto:dchenbec...@gmail.com
>     <mailto:dchenbec...@gmail.com>>>>> wrote:
>      >      >>>>         >
>      >      >>>>         >     I think it depends on how you're
>     embedding them
>      >     in the
>      >      >>>>         XML:
>      >      >>>>         >
>      >      >>>>         >     scala> val m = <span>a&ccedil;b</span>
>      >      >>>>         >     m: scala.xml.Elem = <span>a&ccedil;b</span>
>      >      >>>>         >
>      >      >>>>         >     scala> val m = <span>a{"&ccedil;"}b</span>
>      >      >>>>         >     m: scala.xml.Elem =
>     <span>a&amp;ccedil;b</span>
>      >      >>>>         >
>      >      >>>>         >     scala> val m = <span>a{"ç"}b</span>
>      >      >>>>         >     m: scala.xml.Elem = <span>açb</span>
>      >      >>>>         >
>      >      >>>>         >     That last one was input using dead keys
>     (alt+,)
>      >     on my
>      >      >>>>         linux (USA
>      >      >>>>         >     International with dead keys) layout. Let
>     me know if
>      >      >>>>         this doesn't
>      >      >>>>         >     help; if not, could you send the
>     code/template
>      >     that's
>      >      >>>>         having issues?
>      >      >>>>         >
>      >      >>>>         >     Derek
>      >      >>>>         >
>      >      >>>>         >
>      >      >>>>         >     On Sat, Mar 14, 2009 at 6:36 PM, Charles
>     F. Munat
>      >      >>>>         <c...@munat.com <mailto:c...@munat.com>
>     <mailto:c...@munat.com <mailto:c...@munat.com>>
>      >     <mailto:c...@munat.com <mailto:c...@munat.com>
>     <mailto:c...@munat.com <mailto:c...@munat.com>>>
>      >      >>>>         >     <mailto:c...@munat.com
>     <mailto:c...@munat.com> <mailto:c...@munat.com <mailto:c...@munat.com>>
>      >     <mailto:c...@munat.com <mailto:c...@munat.com>
>     <mailto:c...@munat.com <mailto:c...@munat.com>>>>> wrote:
>      >      >>>>         >
>      >      >>>>         >
>      >      >>>>         >         I have a site that uses a lot of
>     "special"
>      >      >>>>         characters (a remarkably
>      >      >>>>         >         biased description, since there is
>     nothing
>      >      >>>>         "special" about accented
>      >      >>>>         >         characters to the people who use them
>     daily). In
>      >      >>>>         particular, I
>      >      >>>>         >         need the
>      >      >>>>         >         c with cedilla and the n with the tilde.
>      >      >>>>         >
>      >      >>>>         >         These characters are being input to a
>     database
>      >      >>>>         (UTF-8) via an online
>      >      >>>>         >         form, then spit back out onto the page.
>      >      >>>>         >
>      >      >>>>         >         It's a fucking disaster. Apparently,
>     everything
>      >      >>>>         goes through the xml
>      >      >>>>         >         parser, which is great, except when I
>     try to
>      >     enter
>      >      >>>>         these as entity
>      >      >>>>         >         references, such as &ccedil;, the parser
>      >     changes &
>      >      >>>>         to &amp; and
>      >      >>>>         >         I get
>      >      >>>>         >         the literal &ccedil; back out again.
>      >      >>>>         >
>      >      >>>>         >         When I type ç using the keyboard (or
>     copy and
>      >      >>>>         paste it from a
>      >      >>>>         >         page or a
>      >      >>>>         >         text editor), I get gibberish.
>      >      >>>>         >
>      >      >>>>         >         Anyone know the trick to getting
>     around this? I
>      >      >>>>         need everything
>      >      >>>>         >         from e
>      >      >>>>         >         acute to e grave to trademark and
>     registered
>      >      >>>>         trademark symbols,
>      >      >>>>         >         and I
>      >      >>>>         >         need to enter them this way.
>      >      >>>>         >
>      >      >>>>         >         Thanks for any help. If I can get
>     this to work,
>      >      >>>>         I'll add an
>      >      >>>>         >         explanation
>      >      >>>>         >         to the wiki.
>      >      >>>>         >
>      >      >>>>         >         Chas.
>      >      >>>>         >
>      >      >>>>         >
>      >      >>>>         >
>      >      >>>>         >
>      >      >>>>         >
>      >      >>>>         > >
>      >      >>>>
>      >      >>>>
>      >      >>>>
>      >      >>>>
>      >      >>>>
>      >      >>>>
>      >      >>>>
>      >      >>>
>      >      >>>
>      >      >>>
>      >      >>>
>      >      >>
>      >      >>
>      >      >>
>      >      >>
>      >      >
>      >      >
>      >      > >
>      >
>      >
>      >
>      >
>      > >
> 
> 
> 
> 
> > 

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Lift" group.
To post to this group, send email to liftweb@googlegroups.com
To unsubscribe from this group, send email to 
liftweb+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/liftweb?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to