Re: What if Tapestry's I18N was just UTF-8?
+1 to UTF IMHO it's advantage that T5 will take care about it. Concerning the one/two bytes penalties one can always use gzipped output :) 2008/7/30 Josh Long [EMAIL PROTECTED]: +1 me too On Tue, Jul 29, 2008 at 1:04 PM, Ulrich Stärk [EMAIL PROTECTED] wrote: From me too. Uli Filip S. Adamsen schrieb: +1 on this one. -Filip On 2008-07-29 16:39, Howard Lewis Ship wrote: Well, it's not like we're pushing a bytestream from the web browser to the database, or vice-versa. Everything is being read into memory as UTF, whether it starts as UTF-8 in the browser, or ISO-8859-1 in the database. As its read from one source or written to another, the character set is going to change. My observation is that the current design; allowing every page to have its own charset, is beginning to feel like overkill, especially given that the solution has a number of frayed edges. On Tue, Jul 29, 2008 at 2:28 AM, Massimo Lusetti [EMAIL PROTECTED] wrote: On Tue, Jul 29, 2008 at 2:17 AM, Howard Lewis Ship [EMAIL PROTECTED] wrote: Here's a question. I'm still struggling with getting Tapestry to do the right encoding when producing output, and to set the response encoding to the correct value before reading query parameters. There's lots of edge cases, related to Ajax, to form uploads, and to complex components, such as BeanEditForm, where content may be gathered from multiple pages. What if there was just a single default application character set, which would default to UTF-8? This is pretty much what people are doing with the UTF-8 RequestHandler filter. This would simplify a bunch of stuff, since output encoding would always be the same, as would request encoding. We could get rid of the some of the meta-data as well. Is UTF-8 sufficiently well supported by browsers? Is this an option that works for Big5 Chinese and other non-Western language locales? Howard, how this would fit with existing DB and/or other data sources (files for example) already encoded as ISO-8859-1 ? -- Massimo http://meridio.blogspot.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Joshua Long Sun Certified Java Programmer http://www.joshlong.com/ -- Best regards, Renat Zubairov
Re: What if Tapestry's I18N was just UTF-8?
On Wed, Jul 30, 2008 at 1:29 AM, Renat Zubairov [EMAIL PROTECTED] wrote: +1 to UTF IMHO it's advantage that T5 will take care about it. Concerning the one/two bytes penalties one can always use gzipped output :) Something that Tapestry 5.1 should just take care of automatically. 2008/7/30 Josh Long [EMAIL PROTECTED]: +1 me too On Tue, Jul 29, 2008 at 1:04 PM, Ulrich Stärk [EMAIL PROTECTED] wrote: From me too. Uli Filip S. Adamsen schrieb: +1 on this one. -Filip On 2008-07-29 16:39, Howard Lewis Ship wrote: Well, it's not like we're pushing a bytestream from the web browser to the database, or vice-versa. Everything is being read into memory as UTF, whether it starts as UTF-8 in the browser, or ISO-8859-1 in the database. As its read from one source or written to another, the character set is going to change. My observation is that the current design; allowing every page to have its own charset, is beginning to feel like overkill, especially given that the solution has a number of frayed edges. On Tue, Jul 29, 2008 at 2:28 AM, Massimo Lusetti [EMAIL PROTECTED] wrote: On Tue, Jul 29, 2008 at 2:17 AM, Howard Lewis Ship [EMAIL PROTECTED] wrote: Here's a question. I'm still struggling with getting Tapestry to do the right encoding when producing output, and to set the response encoding to the correct value before reading query parameters. There's lots of edge cases, related to Ajax, to form uploads, and to complex components, such as BeanEditForm, where content may be gathered from multiple pages. What if there was just a single default application character set, which would default to UTF-8? This is pretty much what people are doing with the UTF-8 RequestHandler filter. This would simplify a bunch of stuff, since output encoding would always be the same, as would request encoding. We could get rid of the some of the meta-data as well. Is UTF-8 sufficiently well supported by browsers? Is this an option that works for Big5 Chinese and other non-Western language locales? Howard, how this would fit with existing DB and/or other data sources (files for example) already encoded as ISO-8859-1 ? -- Massimo http://meridio.blogspot.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Joshua Long Sun Certified Java Programmer http://www.joshlong.com/ -- Best regards, Renat Zubairov -- Howard M. Lewis Ship Creator Apache Tapestry and Apache HiveMind - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
On Tue, Jul 29, 2008 at 2:17 AM, Howard Lewis Ship [EMAIL PROTECTED] wrote: Here's a question. I'm still struggling with getting Tapestry to do the right encoding when producing output, and to set the response encoding to the correct value before reading query parameters. There's lots of edge cases, related to Ajax, to form uploads, and to complex components, such as BeanEditForm, where content may be gathered from multiple pages. What if there was just a single default application character set, which would default to UTF-8? This is pretty much what people are doing with the UTF-8 RequestHandler filter. This would simplify a bunch of stuff, since output encoding would always be the same, as would request encoding. We could get rid of the some of the meta-data as well. Is UTF-8 sufficiently well supported by browsers? Is this an option that works for Big5 Chinese and other non-Western language locales? Howard, how this would fit with existing DB and/or other data sources (files for example) already encoded as ISO-8859-1 ? -- Massimo http://meridio.blogspot.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
Well, it's not like we're pushing a bytestream from the web browser to the database, or vice-versa. Everything is being read into memory as UTF, whether it starts as UTF-8 in the browser, or ISO-8859-1 in the database. As its read from one source or written to another, the character set is going to change. My observation is that the current design; allowing every page to have its own charset, is beginning to feel like overkill, especially given that the solution has a number of frayed edges. On Tue, Jul 29, 2008 at 2:28 AM, Massimo Lusetti [EMAIL PROTECTED] wrote: On Tue, Jul 29, 2008 at 2:17 AM, Howard Lewis Ship [EMAIL PROTECTED] wrote: Here's a question. I'm still struggling with getting Tapestry to do the right encoding when producing output, and to set the response encoding to the correct value before reading query parameters. There's lots of edge cases, related to Ajax, to form uploads, and to complex components, such as BeanEditForm, where content may be gathered from multiple pages. What if there was just a single default application character set, which would default to UTF-8? This is pretty much what people are doing with the UTF-8 RequestHandler filter. This would simplify a bunch of stuff, since output encoding would always be the same, as would request encoding. We could get rid of the some of the meta-data as well. Is UTF-8 sufficiently well supported by browsers? Is this an option that works for Big5 Chinese and other non-Western language locales? Howard, how this would fit with existing DB and/or other data sources (files for example) already encoded as ISO-8859-1 ? -- Massimo http://meridio.blogspot.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Howard M. Lewis Ship Creator Apache Tapestry and Apache HiveMind - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
Em Tue, 29 Jul 2008 11:39:21 -0300, Howard Lewis Ship [EMAIL PROTECTED] escreveu: My observation is that the current design; allowing every page to have its own charset, is beginning to feel like overkill, especially given that the solution has a number of frayed edges. What about setting a charset for every page, but having the option of what this charset is? Thiago - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AW: What if Tapestry's I18N was just UTF-8?
Hi There, can't we just go with the UTF-8 by default (pages, forms etc.) and specify only changes to this default mapping (global or local or even on activate or using the actual writer. Often you end up using UTF-8 by default and you are messing with filter and forms and stuff. This is indeed anoying (Howard gets my vote on this one). @Thiago I do not get the point about the database and output-streams are non-UTF-8. As Howard said everything becomes a Java-String And is indeed encoded using UTF already. The only chance I know about this is becomes an issue if one 'accidently' does not convert the database strings to UTF correctly and 'accidently' reconvert these strings back to the same character-set (should blow up for certain high byte values). So I guess that converting something to strings should always be Correct. If I missed the point please correct me. Cheers, Martin (Kersten) -Ursprüngliche Nachricht- Von: Thiago H. de Paula Figueiredo [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 29. Juli 2008 16:43 An: Tapestry users Betreff: Re: What if Tapestry's I18N was just UTF-8? Em Tue, 29 Jul 2008 11:39:21 -0300, Howard Lewis Ship [EMAIL PROTECTED] escreveu: My observation is that the current design; allowing every page to have its own charset, is beginning to feel like overkill, especially given that the solution has a number of frayed edges. What about setting a charset for every page, but having the option of what this charset is? Thiago - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
I agree with using UTF-8 as default... We're using the UTF-8 Filter for a while with some different db's (PostgreSql and SqlServer, encoded with ISO-8859-1) and we never had any problem with it. +1 -- Atenciosamente, Marcelo Lotif Programador Java e Tapestry FIEC - Federação das Indústrias do Estado do Ceará (85) 3477-5910 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
Strings are stored in UTF-16 by default anyway in Java (a Char type is 16bits), so unless you're constructing the strings in very creative ways, you're already paying the memory cost. The question of what you output to the browser or the database is a matter for adaptation and transmission/communication time. Christian. On 29-Jul-08, at 12:53 , Marcelo Lotif wrote: I agree with using UTF-8 as default... We're using the UTF-8 Filter for a while with some different db's (PostgreSql and SqlServer, encoded with ISO-8859-1) and we never had any problem with it. +1 -- Atenciosamente, Marcelo Lotif Programador Java e Tapestry FIEC - Federação das Indústrias do Estado do Ceará (85) 3477-5910 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: What if Tapestry's I18N was just UTF-8?
Thiago, Sorry I don't understand your objection. Could you expand on it please? Especially where you say have a memory and bandwidth penalty using 2 bytes to encode many characters that would be encoded as 1 in UTF-8. In my experience char encoding can be an absolute nightmare and having as much as possible as UTF-8 is highly desirable. IIRC Java uses UTF-16 internally which does have 2 bytes for each char, but UTF-8 only uses 2 bytes for unusual chars which is why it's the ideal external charset. Andy. -Original Message- From: Thiago H. de Paula Figueiredo [mailto:[EMAIL PROTECTED] Sent: 29 July 2008 02:59 To: Tapestry users Subject: Re: What if Tapestry's I18N was just UTF-8? Em Mon, 28 Jul 2008 21:17:11 -0300, Howard Lewis Ship [EMAIL PROTECTED] escreveu: What if there was just a single default application character set, which would default to UTF-8? This is not a nice option. Web applications that need accented characters (most Latin languages), but don't need to support another alphabets, will have a memory and bandwidth penalty using 2 bytes to encode many characters that would be encoded as 1 in UTF-8. In addition, I had some problems with Tapestry 5 using UTF-8 when using an existing ISO-8859-1 MySQL database tables. Accented characters were always store as two garbled ones. Maybe I didn't spend enough time to solve it (I was doing some consultancy that had a fixed end date), but this could a huge problem for Tapestry adoption in Latin-speaking languages. My two Brazilian (Portuguese-speaking, with many accented characters) cents, Thiago - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
UTF-8 as default +1
Re: What if Tapestry's I18N was just UTF-8?
Em Tue, 29 Jul 2008 14:44:03 -0300, Blower, Andy [EMAIL PROTECTED] escreveu: Thiago, Hi! Sorry I don't understand your objection. Could you expand on it please? Especially where you say have a memory and bandwidth penalty using 2 bytes to encode many characters that would be encoded as 1 in UTF-8. Oooops, typo of mine. Most Portuguese accented characters are encoded as 2 bytes in UTF-8 and 1 byte in *ISO-8859-1*, AFAIK. So, everytime I write não (no), UTF-8 spends 4 bytes, ISO-8859-1 spends 3. In my experience char encoding can be an absolute nightmare and having as much as possible as UTF-8 is highly desirable. IIRC Java uses UTF-16 internally which does have 2 bytes for each char, but UTF-8 only uses 2 bytes for unusual chars which is why it's the ideal external charset. Agreed, but in many languages unusual characters (from a speaker of English or any other languagen without accents) are not unusual, are frequent. I hope I worded my ideas better now. Regarding database encodings, I think got confused. It was not the ISO-8859-1-encoded database the problem, but ISO-8859-1-encoded Tapestry templates. Everytime an accented character was submited in a form, I would get 2 characters unless I added accepted-encoding=iso-8859-1 to every form tag. Thiago - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
+1 on this one. -Filip On 2008-07-29 16:39, Howard Lewis Ship wrote: Well, it's not like we're pushing a bytestream from the web browser to the database, or vice-versa. Everything is being read into memory as UTF, whether it starts as UTF-8 in the browser, or ISO-8859-1 in the database. As its read from one source or written to another, the character set is going to change. My observation is that the current design; allowing every page to have its own charset, is beginning to feel like overkill, especially given that the solution has a number of frayed edges. On Tue, Jul 29, 2008 at 2:28 AM, Massimo Lusetti [EMAIL PROTECTED] wrote: On Tue, Jul 29, 2008 at 2:17 AM, Howard Lewis Ship [EMAIL PROTECTED] wrote: Here's a question. I'm still struggling with getting Tapestry to do the right encoding when producing output, and to set the response encoding to the correct value before reading query parameters. There's lots of edge cases, related to Ajax, to form uploads, and to complex components, such as BeanEditForm, where content may be gathered from multiple pages. What if there was just a single default application character set, which would default to UTF-8? This is pretty much what people are doing with the UTF-8 RequestHandler filter. This would simplify a bunch of stuff, since output encoding would always be the same, as would request encoding. We could get rid of the some of the meta-data as well. Is UTF-8 sufficiently well supported by browsers? Is this an option that works for Big5 Chinese and other non-Western language locales? Howard, how this would fit with existing DB and/or other data sources (files for example) already encoded as ISO-8859-1 ? -- Massimo http://meridio.blogspot.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
I'm already coding it up, but I created a pre-change tag just in case it turns out to be a problem. I think this is the right approach. On Tue, Jul 29, 2008 at 11:54 AM, Filip S. Adamsen [EMAIL PROTECTED] wrote: +1 on this one. -Filip On 2008-07-29 16:39, Howard Lewis Ship wrote: Well, it's not like we're pushing a bytestream from the web browser to the database, or vice-versa. Everything is being read into memory as UTF, whether it starts as UTF-8 in the browser, or ISO-8859-1 in the database. As its read from one source or written to another, the character set is going to change. My observation is that the current design; allowing every page to have its own charset, is beginning to feel like overkill, especially given that the solution has a number of frayed edges. On Tue, Jul 29, 2008 at 2:28 AM, Massimo Lusetti [EMAIL PROTECTED] wrote: On Tue, Jul 29, 2008 at 2:17 AM, Howard Lewis Ship [EMAIL PROTECTED] wrote: Here's a question. I'm still struggling with getting Tapestry to do the right encoding when producing output, and to set the response encoding to the correct value before reading query parameters. There's lots of edge cases, related to Ajax, to form uploads, and to complex components, such as BeanEditForm, where content may be gathered from multiple pages. What if there was just a single default application character set, which would default to UTF-8? This is pretty much what people are doing with the UTF-8 RequestHandler filter. This would simplify a bunch of stuff, since output encoding would always be the same, as would request encoding. We could get rid of the some of the meta-data as well. Is UTF-8 sufficiently well supported by browsers? Is this an option that works for Big5 Chinese and other non-Western language locales? Howard, how this would fit with existing DB and/or other data sources (files for example) already encoded as ISO-8859-1 ? -- Massimo http://meridio.blogspot.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Howard M. Lewis Ship Creator Apache Tapestry and Apache HiveMind - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
From me too. Uli Filip S. Adamsen schrieb: +1 on this one. -Filip On 2008-07-29 16:39, Howard Lewis Ship wrote: Well, it's not like we're pushing a bytestream from the web browser to the database, or vice-versa. Everything is being read into memory as UTF, whether it starts as UTF-8 in the browser, or ISO-8859-1 in the database. As its read from one source or written to another, the character set is going to change. My observation is that the current design; allowing every page to have its own charset, is beginning to feel like overkill, especially given that the solution has a number of frayed edges. On Tue, Jul 29, 2008 at 2:28 AM, Massimo Lusetti [EMAIL PROTECTED] wrote: On Tue, Jul 29, 2008 at 2:17 AM, Howard Lewis Ship [EMAIL PROTECTED] wrote: Here's a question. I'm still struggling with getting Tapestry to do the right encoding when producing output, and to set the response encoding to the correct value before reading query parameters. There's lots of edge cases, related to Ajax, to form uploads, and to complex components, such as BeanEditForm, where content may be gathered from multiple pages. What if there was just a single default application character set, which would default to UTF-8? This is pretty much what people are doing with the UTF-8 RequestHandler filter. This would simplify a bunch of stuff, since output encoding would always be the same, as would request encoding. We could get rid of the some of the meta-data as well. Is UTF-8 sufficiently well supported by browsers? Is this an option that works for Big5 Chinese and other non-Western language locales? Howard, how this would fit with existing DB and/or other data sources (files for example) already encoded as ISO-8859-1 ? -- Massimo http://meridio.blogspot.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
+1 me too On Tue, Jul 29, 2008 at 1:04 PM, Ulrich Stärk [EMAIL PROTECTED] wrote: From me too. Uli Filip S. Adamsen schrieb: +1 on this one. -Filip On 2008-07-29 16:39, Howard Lewis Ship wrote: Well, it's not like we're pushing a bytestream from the web browser to the database, or vice-versa. Everything is being read into memory as UTF, whether it starts as UTF-8 in the browser, or ISO-8859-1 in the database. As its read from one source or written to another, the character set is going to change. My observation is that the current design; allowing every page to have its own charset, is beginning to feel like overkill, especially given that the solution has a number of frayed edges. On Tue, Jul 29, 2008 at 2:28 AM, Massimo Lusetti [EMAIL PROTECTED] wrote: On Tue, Jul 29, 2008 at 2:17 AM, Howard Lewis Ship [EMAIL PROTECTED] wrote: Here's a question. I'm still struggling with getting Tapestry to do the right encoding when producing output, and to set the response encoding to the correct value before reading query parameters. There's lots of edge cases, related to Ajax, to form uploads, and to complex components, such as BeanEditForm, where content may be gathered from multiple pages. What if there was just a single default application character set, which would default to UTF-8? This is pretty much what people are doing with the UTF-8 RequestHandler filter. This would simplify a bunch of stuff, since output encoding would always be the same, as would request encoding. We could get rid of the some of the meta-data as well. Is UTF-8 sufficiently well supported by browsers? Is this an option that works for Big5 Chinese and other non-Western language locales? Howard, how this would fit with existing DB and/or other data sources (files for example) already encoded as ISO-8859-1 ? -- Massimo http://meridio.blogspot.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Joshua Long Sun Certified Java Programmer http://www.joshlong.com/
What if Tapestry's I18N was just UTF-8?
Here's a question. I'm still struggling with getting Tapestry to do the right encoding when producing output, and to set the response encoding to the correct value before reading query parameters. There's lots of edge cases, related to Ajax, to form uploads, and to complex components, such as BeanEditForm, where content may be gathered from multiple pages. What if there was just a single default application character set, which would default to UTF-8? This is pretty much what people are doing with the UTF-8 RequestHandler filter. This would simplify a bunch of stuff, since output encoding would always be the same, as would request encoding. We could get rid of the some of the meta-data as well. Is UTF-8 sufficiently well supported by browsers? Is this an option that works for Big5 Chinese and other non-Western language locales? -- Howard M. Lewis Ship Creator Apache Tapestry and Apache HiveMind - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What if Tapestry's I18N was just UTF-8?
Em Mon, 28 Jul 2008 21:17:11 -0300, Howard Lewis Ship [EMAIL PROTECTED] escreveu: What if there was just a single default application character set, which would default to UTF-8? This is not a nice option. Web applications that need accented characters (most Latin languages), but don't need to support another alphabets, will have a memory and bandwidth penalty using 2 bytes to encode many characters that would be encoded as 1 in UTF-8. In addition, I had some problems with Tapestry 5 using UTF-8 when using an existing ISO-8859-1 MySQL database tables. Accented characters were always store as two garbled ones. Maybe I didn't spend enough time to solve it (I was doing some consultancy that had a fixed end date), but this could a huge problem for Tapestry adoption in Latin-speaking languages. My two Brazilian (Portuguese-speaking, with many accented characters) cents, Thiago - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]