Hi, there.

I tried several ways to figure it out.

The function void XMLFormatter::writeBOM(const XMLByte *const toFormat, const 
XMLSize_t count ): toFormat is BOM byte series, and count is sizeof toFormat.

For UTF-8, though UTF-8 does not need BOM, the necessary code is as follows:
XMLByte bom[] = { 0xEF, 0xBB, 0xBF };
pFormatter->writeBOM(bom, sizeof bom);

For UTF-16 and UTF-32, we don't have to use the function writeBOM(...).
The necessary code is only as follows:
*pFormatter << (XMLCh) 0xFEFF;
According to the encoding setting, each byte will automatically go to its own 
position where it has to go to.  So, we don't have to care about the byte order 
of BOM.

thanks.
Youngho.
  ----- Original Message ----- 
  From: PARK Youngho 
  To: [email protected] 
  Sent: Saturday, October 24, 2009 12:15 PM
  Subject: Re: Re: Re: Encoding problem in using XMLFormatter and 
LocalFileFormatTarget


  Hi, there.
  Thanks.
  As you said, I searched Internet for some informaiton on BOM and tried to 
write the BOM and it worked well.

  I have more question on BOM of Xerces.
  I found a class member function related with BOM at 
  
http://xerces.apache.org/xerces-c/apiDocs-3/classXMLFormatter.html#8d252614d23d50de4034740c9bae544d
 as follows:
  void XMLFormatter::writeBOM(const XMLByte *const toFormat, const XMLSize_t 
count ).

  But, the webpage does not give any information about that member function.  
What is toFormat for?  And, what is count for?  Is count the length of the 
string toFormat?  If the string toFormat null-terminated string, why is count 
needed?  BOM is just XMLCh type character.  Why is string needed for the 
function writeBOM(...)?

  Thanks.
  Youngho.
    ----- Original Message ----- 
    From: John Lilley 
    To: [email protected] 
    Sent: Friday, October 23, 2009 8:02 PM
    Subject: RE: Re: Encoding problem in using XMLFormatter and 
LocalFileFormatTarget


    You can also try writing the BOM, which should tell notepad and other 
programs about UTF-16.
    john

    -----Original Message-----
    From: PARK Youngho [mailto:[email protected]] 
    Sent: Friday, October 23, 2009 5:13 AM
    To: [email protected]
    Subject: Re: Re: Encoding problem in using XMLFormatter and 
LocalFileFormatTarget

    Dear Alberto Massari.

    Thank you so much for your advice.
    You are right.

    As you said, I tried to open it again with notepad with Unicode encoding 
option.  It opens the UTF-16 text file correctly now.

    I thought that I didn't understand correctly how to use Xercesc classes.  I 
have struggled with it for more than 2 months.  So, now I was about giving up 
to use Xercesc with other encodings other than UTF-8.

    Thank you so so so so so much !
    Youngho.
      ----- Original Message ----- 
      From: Alberto Massari 
      To: [email protected] 
      Sent: Friday, October 23, 2009 2:37 PM
      Subject: Re: Encoding problem in using XMLFormatter and 
LocalFileFormatTarget


      When notepad opens a file it doesn't know which encoding has been used; 
      have you specified the "Unicode" encoding option in the open file dialog?

      Alberto

      PARK Youngho wrote:
      > Hi, there.
      > I am using Xercesc 3.0.1.
      > I have encoding problem in using XMLFormatter and 
LocalFileFormatTarget.  If I use UTF-8, everything is fine.  However, if I use 
UTF-16, it appears that alphanumeric characters were written fine but Korean 
characters were not properly written.  With XMLFormatter and 
LocalFileFormatTarget, I tried to write a file with string "ABC123하늘".  "하늘" is 
Korean string.  The file written with UTF-8 was alright when I open it with 
Notepad.  But, the file written with UTF-16 was not found correct when I open 
it with Notepad.  "A B C 1 2 3 X? " was written instead of "ABC123하늘".
      >
      > Don't I know something that I have to know?
      >
      > The code is as follows.
      >
      > #include "stdafx.h"
      > #include <conio.h>
      > #include <iostream>
      > #include <xercesc/framework/XMLFormatter.hpp>
      > #include <xercesc/framework/LocalFileFormatTarget.hpp>
      >
      > using namespace xercesc;
      > using namespace std;
      >
      > int _tmain(int argc, _TCHAR* argv[])
      > {
      >     XMLPlatformUtils::Initialize();
      >     LocalFileFormatTarget* pTarget = new 
LocalFileFormatTarget(L"testUTF16.txt");
      >     XMLFormatter*   pFormatter = new XMLFormatter(L"UTF-16LE", pTarget);
      >     *pFormatter << XMLFormatter::NoEscapes << L"ABC123하늘";
      >     pTarget->flush();
      >     delete pFormatter;
      >     delete pTarget;
      >     pTarget = new LocalFileFormatTarget(L"testUTF8.txt");
      >     pFormatter = new XMLFormatter(L"UTF-8", pTarget);
      >     *pFormatter << XMLFormatter::NoEscapes << L"ABC123하늘";
      >     pTarget->flush();
      >     delete pFormatter;
      >     delete pTarget;
      >
      >     XMLPlatformUtils::Terminate();
      >     return 0;
      > }
      >
      > Thanks.
      > Youngho.
      >
      > __(*^.^*)__

Reply via email to