Re: Error when using XSL with French Characters
Hi, Andreas Delmelle wrote: On Sep 3, 2008, at 18:35, Steffanina, Jeff wrote: Hi Jeff There is always one MORE option to consider!! What would you suggest as the best way to handle this? I think I'd opt for using (N)umeric (C)haracter (R)eferences. Reasoning would be that if one changes the BASIC code to emit the sequence '#xE8;', this will never, ever have to be changed (unless Unicode would somehow decide on altering the codepoints). You can change the encoding in the XML header all you want, NCRs will always work. On the other hand, if you have a LOT of those characters, using NCRs could make your XML a bit bulky (instead of 1 byte/character, you Not mentioning the fact that this would make the document really tedious to type, and not very readable... actually generate 6-8 bytes to represent one character in the final result; the XML parser, instead of needing only one byte, has to parse all bytes from '' up to and including ';'). The character code you mentioned earlier (130) is the decimal value for 'é' in ASCII, so if you're concerned with the size of the XML and do not want to generate 6 bytes for one character, try specifying US-ASCII as encoding for the source XML. No, US-ASCII is a 7-bit character set, which means it can contain only 128 characters, none of them being an accented letter [1]. From your other message it looks like the default character set on your system is ISO-8859-15, which is ok for all of the western languages plus a few more [2]. Your BASIC program probably uses that character set, in which case you just have to change the header of your xml file: ?xml version=1.0 encoding=ISO-8859-15? As long as you put the right header in the XML file you can live with that setup. However, it is safer to switch to UTF-8 now, in order to avoid troubles in the future. Indeed, it’s probable that when you change your computer or upgrade your system the default character set will become UTF-8. Then if you re-edit that file on the new system, accented letters will be entered as UTF-8 sequences that are incompatible with ISO-8859-15, and you’ll basically see garbage in the result. Unless your editor is elaborate enough to recognize that the file is xml, and parses the header to get its encoding. But I doubt many editors do that... You can choose to convert your files to UTF-8 later on, but that might represent a lot of work, plus you will have to edit every file to change the xml header to UTF-8. Since the use of UTF-8 as the default charset will happen sooner or later, you better do that now, when you don’t have too many files. Changing the default character set is very system-dependent. Basically you have to play with the LOCALE variable. You can (may) get a list of available locales by typing the following command in a terminal: $ locale -a C en_US.iso885915 en_US.utf8 ... If no UTF-8 locale is available it must be generated. Try to find documentation for your system or ask the system administrator if applicable... You find that complicated? It is, it has always been, and I’m afraid it may forever be. This is historical... [1] http://en.wikipedia.org/wiki/Ascii [2] http://en.wikipedia.org/wiki/ISO/IEC_8859-15 HTH, Vincent - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Error when using XSL with French Characters
On Sep 4, 2008, at 12:06, Vincent Hennebert wrote: snip / No, US-ASCII is a 7-bit character set, which means it can contain only 128 characters, none of them being an accented letter [1]. Ouch! Indeed. I'm so used to the basic 7-bit set being extended... To think that I even tried it over here in an editor. If I had only also tried to actually save the file, I would have noticed... Sorry for the confusion, Jeff. The conclusion is definitely the right one: if you can somehow manage to have the BASIC code write the file as UTF-8, all the encoding hassles disappear. Cheers Andreas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Error when using XSL with French Characters
Manuel, We create the XML using a version of BASIC. To create this particular character, we send CHR(130) to the XML. When I open the XML in vi, I see the proper FRENCH symbol. Jeff From: Manuel Mall [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 02, 2008 10:51 PM To: 'fop-users@xmlgraphics.apache.org' Subject: RE: Error when using XSL with French Characters I am suspicious that although you declare the XML file as being in UTF-8 it actually isn't. How do you produce the XML file? Manuel From: Steffanina, Jeff [mailto:[EMAIL PROTECTED] Sent: Wednesday, 3 September 2008 10:23 AM To: fop-users@xmlgraphics.apache.org Subject: Error when using XSL with French Characters My Friends, Fop-0.95 My style sheet has been working perfectly. However, the user submitted some text in French. In the text was a letter e with an accent above it. That character caused the following error: Invalid byte 1 of 1-byte UTF-8 sequence. My .xml looks fine. The e with the accent above it is perfect. First line in my XML: ?xml version=1.0 encoding=UTF-8? Here is the first line of my XSL: ?xml version=1.0 encoding=UTF-8? I am confused over why the UTF-8 for the XML understands the character but the UTF-8 in the XSL does not? I found an article that suggests that the problem would be solved with: ?xml version=1.0 encoding=8859-1? Would this be a viable/recommended solution? Do you have a better idea?
Re: Error when using XSL with French Characters
There are four kinds of accent current in French (é è ê ë) so you should be more precise. None of them can possibly correspond to CHR(130) neither in UTF-8 nor in ISO-8859-1 On what kind of system/platform/OS are you working ? Mentioning vi makes me guess it should be some kind of Unix but at the same time the encoding used makes this improbable... I guess more information is needed here. Steffanina, Jeff a écrit : Manuel, We create the XML using a version of BASIC. To create this particular character, we send CHR(130) to the XML. When I open the XML in vi, I see the proper FRENCH symbol. */Jeff /* *From:* Manuel Mall [mailto:[EMAIL PROTECTED] *Sent:* Tuesday, September 02, 2008 10:51 PM *To:* 'fop-users@xmlgraphics.apache.org' *Subject:* RE: Error when using XSL with French Characters I am suspicious that although you declare the XML file as being in UTF-8 it actually isn't. How do you produce the XML file? Manuel *From:* Steffanina, Jeff [mailto:[EMAIL PROTECTED] *Sent:* Wednesday, 3 September 2008 10:23 AM *To:* fop-users@xmlgraphics.apache.org *Subject:* Error when using XSL with French Characters My Friends, Fop-0.95 My style sheet has been working perfectly. However, the user submitted some text in French. In the text was a letter e with an accent above it. That character caused the following error: Invalid byte 1 of 1-byte UTF-8 sequence. My .xml looks fine. The e with the accent above it is perfect. First line in my XML: ?xml version=1.0 encoding=UTF-8? Here is the first line of my XSL: ?xml version=1.0 encoding=UTF-8? I am confused over why the UTF-8 for the XML understands the character but the UTF-8 in the XSL does not? I found an article that suggests that the problem would be solved with: ?xml version=1.0 encoding=8859-1? Would this be a viable/recommended solution? Do you have a better idea? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Error when using XSL with French Characters
Jean-Francois, fop-0.95 I am running Redhat Linux 2.4.21-47.0.1. The letter I am referring to is: é è I assume I am having problems with any French character that includes a glyph. What are you using for ?xml version=1.0 encoding=? I appreciate any suggestions. I have not had to deal with international characters sets before. Thanks. Jeff -Original Message- From: Jean-François El Fouly [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 03, 2008 8:58 AM To: fop-users@xmlgraphics.apache.org Subject: Re: Error when using XSL with French Characters There are four kinds of accent current in French (é è ê ë) so you should be more precise. None of them can possibly correspond to CHR(130) neither in UTF-8 nor in ISO-8859-1 On what kind of system/platform/OS are you working ? Mentioning vi makes me guess it should be some kind of Unix but at the same time the encoding used makes this improbable... I guess more information is needed here. Steffanina, Jeff a écrit : Manuel, We create the XML using a version of BASIC. To create this particular character, we send CHR(130) to the XML. When I open the XML in vi, I see the proper FRENCH symbol. */Jeff /* *From:* Manuel Mall [mailto:[EMAIL PROTECTED] *Sent:* Tuesday, September 02, 2008 10:51 PM *To:* 'fop-users@xmlgraphics.apache.org' *Subject:* RE: Error when using XSL with French Characters I am suspicious that although you declare the XML file as being in UTF-8 it actually isn't. How do you produce the XML file? Manuel *From:* Steffanina, Jeff [mailto:[EMAIL PROTECTED] *Sent:* Wednesday, 3 September 2008 10:23 AM *To:* fop-users@xmlgraphics.apache.org *Subject:* Error when using XSL with French Characters My Friends, Fop-0.95 My style sheet has been working perfectly. However, the user submitted some text in French. In the text was a letter e with an accent above it. That character caused the following error: Invalid byte 1 of 1-byte UTF-8 sequence. My .xml looks fine. The e with the accent above it is perfect. First line in my XML: ?xml version=1.0 encoding=UTF-8? Here is the first line of my XSL: ?xml version=1.0 encoding=UTF-8? I am confused over why the UTF-8 for the XML understands the character but the UTF-8 in the XSL does not? I found an article that suggests that the problem would be solved with: ?xml version=1.0 encoding=8859-1? Would this be a viable/recommended solution? Do you have a better idea? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Error when using XSL with French Characters
Jean-Francois, On my Linux box I have this entry in: /etc/sysconfig/i18n LANG=en_US.iso885915 Jeff Steffanina FOSSE Development, Bethesda, MD (301)380-2047 [EMAIL PROTECTED] This communication contains information from Marriott International, Inc. that may be confidential. Except for personal use by the intended recipient, or as expressly authorized by the sender, any person who receives this information is prohibited from disclosing, copying, distributing, and/or using it. If you have received this communication in error, please immediately delete it and all copies, and promptly notify the sender. Nothing in this communication is intended as an electronic signature under applicable law. -Original Message- From: Jean-François El Fouly [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 03, 2008 8:58 AM To: fop-users@xmlgraphics.apache.org Subject: Re: Error when using XSL with French Characters There are four kinds of accent current in French (é è ê ë) so you should be more precise. None of them can possibly correspond to CHR(130) neither in UTF-8 nor in ISO-8859-1 On what kind of system/platform/OS are you working ? Mentioning vi makes me guess it should be some kind of Unix but at the same time the encoding used makes this improbable... I guess more information is needed here. Steffanina, Jeff a écrit : Manuel, We create the XML using a version of BASIC. To create this particular character, we send CHR(130) to the XML. When I open the XML in vi, I see the proper FRENCH symbol. */Jeff /* *From:* Manuel Mall [mailto:[EMAIL PROTECTED] *Sent:* Tuesday, September 02, 2008 10:51 PM *To:* 'fop-users@xmlgraphics.apache.org' *Subject:* RE: Error when using XSL with French Characters I am suspicious that although you declare the XML file as being in UTF-8 it actually isn't. How do you produce the XML file? Manuel *From:* Steffanina, Jeff [mailto:[EMAIL PROTECTED] *Sent:* Wednesday, 3 September 2008 10:23 AM *To:* fop-users@xmlgraphics.apache.org *Subject:* Error when using XSL with French Characters My Friends, Fop-0.95 My style sheet has been working perfectly. However, the user submitted some text in French. In the text was a letter e with an accent above it. That character caused the following error: Invalid byte 1 of 1-byte UTF-8 sequence. My .xml looks fine. The e with the accent above it is perfect. First line in my XML: ?xml version=1.0 encoding=UTF-8? Here is the first line of my XSL: ?xml version=1.0 encoding=UTF-8? I am confused over why the UTF-8 for the XML understands the character but the UTF-8 in the XSL does not? I found an article that suggests that the problem would be solved with: ?xml version=1.0 encoding=8859-1? Would this be a viable/recommended solution? Do you have a better idea? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Error when using XSL with French Characters
On Sep 3, 2008, at 15:05, Steffanina, Jeff wrote: Hi Jeff fop-0.95 I am running Redhat Linux 2.4.21-47.0.1. The letter I am referring to is: é è I assume I am having problems with any French character that includes a glyph. What are you using for ?xml version=1.0 encoding=? I appreciate any suggestions. I have not had to deal with international characters sets before. If all else fails, remember that XML *always* allows Numeric Character References, like #x0A; or #10; for a linefeed (values are always UTF-8 codepoints). In UTF-8, the respective character codes are: #xE8; - è #xE9; - é If you output those sequences in the BASIC module, then it should work, regardless of which encoding is specified in the XML header. HTH! Cheers Andreas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Error when using XSL with French Characters
There is always one MORE option to consider!! What would you suggest as the best way to handle this? Jeff -Original Message- From: Andreas Delmelle [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 03, 2008 12:32 PM To: fop-users@xmlgraphics.apache.org Subject: Re: Error when using XSL with French Characters On Sep 3, 2008, at 15:05, Steffanina, Jeff wrote: Hi Jeff fop-0.95 I am running Redhat Linux 2.4.21-47.0.1. The letter I am referring to is: é è I assume I am having problems with any French character that includes a glyph. What are you using for ?xml version=1.0 encoding=? I appreciate any suggestions. I have not had to deal with international characters sets before. If all else fails, remember that XML *always* allows Numeric Character References, like #x0A; or #10; for a linefeed (values are always UTF-8 codepoints). In UTF-8, the respective character codes are: #xE8; - è #xE9; - é If you output those sequences in the BASIC module, then it should work, regardless of which encoding is specified in the XML header. HTH! Cheers Andreas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Error when using XSL with French Characters
On Sep 3, 2008, at 18:35, Steffanina, Jeff wrote: Hi Jeff There is always one MORE option to consider!! What would you suggest as the best way to handle this? I think I'd opt for using (N)umeric (C)haracter (R)eferences. Reasoning would be that if one changes the BASIC code to emit the sequence '#xE8;', this will never, ever have to be changed (unless Unicode would somehow decide on altering the codepoints). You can change the encoding in the XML header all you want, NCRs will always work. On the other hand, if you have a LOT of those characters, using NCRs could make your XML a bit bulky (instead of 1 byte/character, you actually generate 6-8 bytes to represent one character in the final result; the XML parser, instead of needing only one byte, has to parse all bytes from '' up to and including ';'). The character code you mentioned earlier (130) is the decimal value for 'é' in ASCII, so if you're concerned with the size of the XML and do not want to generate 6 bytes for one character, try specifying US- ASCII as encoding for the source XML. HTH! Andreas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Error when using XSL with French Characters
I am suspicious that although you declare the XML file as being in UTF-8 it actually isn't. How do you produce the XML file? Manuel _ From: Steffanina, Jeff [mailto:[EMAIL PROTECTED] Sent: Wednesday, 3 September 2008 10:23 AM To: fop-users@xmlgraphics.apache.org Subject: Error when using XSL with French Characters My Friends, Fop-0.95 My style sheet has been working perfectly. However, the user submitted some text in French. In the text was a letter e with an accent above it. That character caused the following error: Invalid byte 1 of 1-byte UTF-8 sequence. My .xml looks fine. The e with the accent above it is perfect. First line in my XML: ?xml version=1.0 encoding=UTF-8? Here is the first line of my XSL: ?xml version=1.0 encoding=UTF-8? I am confused over why the UTF-8 for the XML understands the character but the UTF-8 in the XSL does not? I found an article that suggests that the problem would be solved with: ?xml version=1.0 encoding=8859-1? Would this be a viable/recommended solution? Do you have a better idea? Jeff Steffanina FOSSE Development, Bethesda, MD (301)380-2047 [EMAIL PROTECTED] This communication contains information from Marriott International, Inc. that may be confidential. Except for personal use by the intended recipient, or as expressly authorized by the sender, any person who receives this information is prohibited from disclosing, copying, distributing, and/or using it. If you have received this communication in error, please immediately delete it and all copies, and promptly notify the sender. Nothing in this communication is intended as an electronic signature under applicable law.