Hi, there.
I tried several ways to figure it out.
The function void XMLFormatter::writeBOM(const XMLByte *const toFormat, const
XMLSize_t count ): toFormat is BOM byte series, and count is sizeof toFormat.
For UTF-8, though UTF-8 does not need BOM, the necessary code is as follows:
XMLByte bom[] = { 0xEF, 0xBB, 0xBF };
pFormatter->writeBOM(bom, sizeof bom);
For UTF-16 and UTF-32, we don't have to use the function writeBOM(...).
The necessary code is only as follows:
*pFormatter << (XMLCh) 0xFEFF;
According to the encoding setting, each byte will automatically go to its own
position where it has to go to. So, we don't have to care about the byte order
of BOM.
thanks.
Youngho.
----- Original Message -----
From: PARK Youngho
To: [email protected]
Sent: Saturday, October 24, 2009 12:15 PM
Subject: Re: Re: Re: Encoding problem in using XMLFormatter and
LocalFileFormatTarget
Hi, there.
Thanks.
As you said, I searched Internet for some informaiton on BOM and tried to
write the BOM and it worked well.
I have more question on BOM of Xerces.
I found a class member function related with BOM at
http://xerces.apache.org/xerces-c/apiDocs-3/classXMLFormatter.html#8d252614d23d50de4034740c9bae544d
as follows:
void XMLFormatter::writeBOM(const XMLByte *const toFormat, const XMLSize_t
count ).
But, the webpage does not give any information about that member function.
What is toFormat for? And, what is count for? Is count the length of the
string toFormat? If the string toFormat null-terminated string, why is count
needed? BOM is just XMLCh type character. Why is string needed for the
function writeBOM(...)?
Thanks.
Youngho.
----- Original Message -----
From: John Lilley
To: [email protected]
Sent: Friday, October 23, 2009 8:02 PM
Subject: RE: Re: Encoding problem in using XMLFormatter and
LocalFileFormatTarget
You can also try writing the BOM, which should tell notepad and other
programs about UTF-16.
john
-----Original Message-----
From: PARK Youngho [mailto:[email protected]]
Sent: Friday, October 23, 2009 5:13 AM
To: [email protected]
Subject: Re: Re: Encoding problem in using XMLFormatter and
LocalFileFormatTarget
Dear Alberto Massari.
Thank you so much for your advice.
You are right.
As you said, I tried to open it again with notepad with Unicode encoding
option. It opens the UTF-16 text file correctly now.
I thought that I didn't understand correctly how to use Xercesc classes. I
have struggled with it for more than 2 months. So, now I was about giving up
to use Xercesc with other encodings other than UTF-8.
Thank you so so so so so much !
Youngho.
----- Original Message -----
From: Alberto Massari
To: [email protected]
Sent: Friday, October 23, 2009 2:37 PM
Subject: Re: Encoding problem in using XMLFormatter and
LocalFileFormatTarget
When notepad opens a file it doesn't know which encoding has been used;
have you specified the "Unicode" encoding option in the open file dialog?
Alberto
PARK Youngho wrote:
> Hi, there.
> I am using Xercesc 3.0.1.
> I have encoding problem in using XMLFormatter and
LocalFileFormatTarget. If I use UTF-8, everything is fine. However, if I use
UTF-16, it appears that alphanumeric characters were written fine but Korean
characters were not properly written. With XMLFormatter and
LocalFileFormatTarget, I tried to write a file with string "ABC123하늘". "하늘" is
Korean string. The file written with UTF-8 was alright when I open it with
Notepad. But, the file written with UTF-16 was not found correct when I open
it with Notepad. "A B C 1 2 3 X? " was written instead of "ABC123하늘".
>
> Don't I know something that I have to know?
>
> The code is as follows.
>
> #include "stdafx.h"
> #include <conio.h>
> #include <iostream>
> #include <xercesc/framework/XMLFormatter.hpp>
> #include <xercesc/framework/LocalFileFormatTarget.hpp>
>
> using namespace xercesc;
> using namespace std;
>
> int _tmain(int argc, _TCHAR* argv[])
> {
> XMLPlatformUtils::Initialize();
> LocalFileFormatTarget* pTarget = new
LocalFileFormatTarget(L"testUTF16.txt");
> XMLFormatter* pFormatter = new XMLFormatter(L"UTF-16LE", pTarget);
> *pFormatter << XMLFormatter::NoEscapes << L"ABC123하늘";
> pTarget->flush();
> delete pFormatter;
> delete pTarget;
> pTarget = new LocalFileFormatTarget(L"testUTF8.txt");
> pFormatter = new XMLFormatter(L"UTF-8", pTarget);
> *pFormatter << XMLFormatter::NoEscapes << L"ABC123하늘";
> pTarget->flush();
> delete pFormatter;
> delete pTarget;
>
> XMLPlatformUtils::Terminate();
> return 0;
> }
>
> Thanks.
> Youngho.
>
> __(*^.^*)__