hi,
I actually need the whole text with the zwj. My code i am attaching. Only
the section which does interaction with xml file. Hope its enough. My code
is little big, so it may take a little time for you to understand i havent
commented it properly. If you need explanation on any part please let me
know.
cheers
Jinesh K J
On Nov 28, 2007 5:43 PM, Alberto Massari <[EMAIL PROTECTED]> wrote:
> The file you attached is correct, and the same modified DOMPrint that I
> used before return the ZWJ characters in the content of getTextContent.
> Could you show us the code you are using to read the file?
>
> Alberto
>
> jinesh kj wrote:
> > hi,
> >
> > I dumped using mysql -X command which will give me output as xml file.
> > I dont know whether there is any problem with my xml files. Is there
> > any specific notation to represent the ZWJ and ZWNJ in xml files?
> >
> > I am attaching an xml file i have.
> >
> > Thank you for your help, and if you have a better idea what to do with
> > the xml file when i get characters like these, or any links to those
> > details, please point me.
> >
> > regards
> >
> > Jinesh K J
> >
> > On Nov 28, 2007 4:46 PM, Alberto Massari <[EMAIL PROTECTED]
> > <mailto:[EMAIL PROTECTED]>> wrote:
> >
> > If you can read the original file, but not when you edit it, I
> > would bet
> > the reason is in the way you edit your XML files (and dump from the
> > database). What are you using? Could you attach a small sample file?
> >
> > Alberto
> >
> > jinesh kj wrote:
> > > hi,
> > >
> > > I tried reading the file you send. It didnt give any error,
> > which means it
> > > was reading perfectly. I dont know how to check in the debugger
> > and all, so
> > > dont know whether it read 200d or not. But if i try to edit the
> > xml file,
> > > with some text data along with, it is not reading the the text.
> > Do i have to
> > > do anything for it? Basically i am trying to read through an xml
> > file, which
> > > is a dump of mysql database. It have many zwj and all. I dont
> > know whether
> > > it is according to specified encoding or so and all.But since it
> > was dumped
> > > from database, using the built in function, i think a chance for
> > error is
> > > too low.
> > >
> > > I am trying to use a similar function only, in my program, it
> > returns
> > > nothing when there is a ZWJ in my data.
> > >
> > > I hope i am clear. I am able to read xml files without ZWJ easily.
> > >
> > > regards
> > >
> > > Jinesh K J
> > >
> > > On Nov 28, 2007 4:02 PM, Alberto Massari
> > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
> > >
> > >
> > >> I am attaching a sample XML that contains a U+200D character
> > between a
> > >> --| and |-- pattern; I modified DOMPrint to issue a
> > >>
> > >> const XMLCh*
> > data=doc->getDocumentElement()->getTextContent();
> > >>
> > >> and in the debugger I see that data[4] is \x200D
> > >> Have you checked your source XML really has that character?
> > Also, is
> > >> the representation of the ZWJ character in the XML file valid
> > according
> > >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
> > >>
> > >> Alberto
> > >>
> > >> jinesh kj wrote:
> > >>
> > >>> hi,
> > >>>
> > >>> Actually, getTextContent is not returning any value when there
> > is a Zero
> > >>> width joiner.
> > >>>
> > >>> cheers
> > >>>
> > >>> Jinesh K J
> > >>>
> > >>> On Nov 28, 2007 3:28 PM, Alberto Massari
> > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
> > >>>
> > >> wrote:
> > >>
> > >>>
> > >>>> Hi Jinesh,
> > >>>> which kind of issues are you having? The text returned by
> > >>>>
> > >> getTextContent
> > >>
> > >>>> should contain a \x200D value inside. Or have you transcoded
> > it into
> > >>>> chars?
> > >>>>
> > >>>> Alberto
> > >>>>
> > >>>> jinesh kj wrote:
> > >>>>
> > >>>>
> > >>>>> hi all,
> > >>>>>
> > >>>>> I was trying to read from an XML file where some data have
> > ZERO Width
> > >>>>>
> > >>>>>
> > >>>> Joiner
> > >>>>
> > >>>>
> > >>>>> in it. I used the getTextContent in DOMNode. I was able to
> > read the
> > >>>>>
> > >>>>>
> > >>>> contents
> > >>>>
> > >>>>
> > >>>>> without Zero width joiner, but there are some issues with
> these
> > >>>>>
> > >> special
> > >>
> > >>>>> characters. What do i have to change? Do i have to make any
> > special
> > >>>>> settings? Or do i have to use any other function insttead?
> > >>>>>
> > >>>>> cheers
> > >>>>> Jinesh K J
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >>
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> > My Feelings,Expressions-
> > http://logbookofanobserver.blogspot.com
> >
> > SMC : My computer, My language http://smc.org.in
> > സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>
>
--
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com
SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
void GetConfig::readConfigFile(string& configFile)
throw( std::runtime_error )
{
int flag = 0;
// Test to see if the file is ok.
struct stat fileStatus;
int iretStat = stat(configFile.c_str(), &fileStatus);
if( iretStat == ENOENT )
throw ( std::runtime_error("Path file_name does not exist, or path is an empty string.") );
else if( iretStat == ENOTDIR )
throw ( std::runtime_error("A component of the path is not a directory."));
else if( iretStat == ELOOP )
throw ( std::runtime_error("Too many symbolic links encountered while traversing the path."));
else if( iretStat == EACCES )
throw ( std::runtime_error("Permission denied."));
else if( iretStat == ENAMETOOLONG )
throw ( std::runtime_error("File can not be read\n"));
//Configure DOM parser
m_ConfigFileParser->setValidationScheme( XercesDOMParser::Val_Never );
m_ConfigFileParser->setDoNamespaces( false );
m_ConfigFileParser->setDoSchema( false );
m_ConfigFileParser->setLoadExternalDTD( false );
try
{
m_ConfigFileParser->parse( configFile.c_str() );
// no need to free this pointer - owned by the parent parser object
DOMDocument* xmlDoc = m_ConfigFileParser->getDocument();
// Get the top-level element.
DOMElement* elementRoot = xmlDoc->getDocumentElement();
if( !elementRoot ) throw(std::runtime_error( "empty XML document" ));
//Get the children of root node.
// Get size of the list so as to go through the list.
DOMNodeList* children = elementRoot->getChildNodes();
const XMLSize_t nodeCount = children->getLength();
//Go through the list of children till end.(Actually parses each row).
for( XMLSize_t xx = 0; xx < nodeCount; ++xx )
{
DOMNode* CurrentNode = children->item(xx);
//Get children of the current node(get fields of each row).
DOMNodeList* CurrentChildren = CurrentNode->getChildNodes();
const XMLSize_t childCount = CurrentChildren->getLength();
//Go through the list of fields of each row.
for(XMLSize_t yy = 0; yy < childCount; ++yy)
{
DOMNode* currentField = CurrentChildren->item(yy);
if( currentField->getNodeType() && // true is not NULL
currentField->getNodeType() == DOMNode::ELEMENT_NODE ) // is element
{
// Found node which is an Element. Re-cast node as element
DOMElement* currentElement
= dynamic_cast< xercesc::DOMElement* >( currentField );
//check the atrribute value 'name' and compare with required fields. If matches get the content by getTextContent()
const XMLCh* xmlch_name
=currentElement->getAttribute(XMLString::transcode("name"));
if(XMLString::equals(xmlch_name,ATTR_Block)&&flag==1)
{
const XMLCh* value = currentElement->getTextContent();
m_OptionA = XMLString::transcode(value);
location.push_back(atoi(m_OptionA));
}
if(XMLString::equals(xmlch_name,ATTR_Line)&&flag==1)
{
const XMLCh* value = currentElement->getTextContent();
m_OptionA = XMLString::transcode(value);
location.push_back(atoi(m_OptionA));
}
if(XMLString::equals(xmlch_name,ATTR_Word)&&flag==1)
{
const XMLCh* value = currentElement->getTextContent();
m_OptionA = XMLString::transcode(value);
location.push_back(atoi(m_OptionA));
}
if(XMLString::equals(xmlch_name,ATTR_rectLeft)&&flag==1)
{
const XMLCh* value = currentElement->getTextContent();
m_OptionA = XMLString::transcode(value);
bounds.push_back(atoi(m_OptionA));
}
if(XMLString::equals(xmlch_name,ATTR_rectRight)&&flag==1)
{
const XMLCh* value = currentElement->getTextContent();
m_OptionA = XMLString::transcode(value);
bounds.push_back(atoi(m_OptionA));
}
if(XMLString::equals(xmlch_name,ATTR_rectTop)&&flag==1)
{
const XMLCh* value = currentElement->getTextContent();
m_OptionA = XMLString::transcode(value);
bounds.push_back(atoi(m_OptionA));
}
if(XMLString::equals(xmlch_name,ATTR_rectBot)&&flag==1)
{
const XMLCh* value = currentElement->getTextContent();
m_OptionA = XMLString::transcode(value);
bounds.push_back(atoi(m_OptionA));
}
if(XMLString::equals(xmlch_name,ATTR_BookCode))
{
const XMLCh* value = currentElement->getTextContent();
m_OptionA = XMLString::transcode(value);
bookCode = atoi(m_OptionA);
if(strcmp(m_OptionA,code)==0) flag=1;
else flag =0;
}
if(XMLString::equals(xmlch_name,ATTR_Text)&&flag==1)
{
// DOMText* textdata = dynamic_cast< xercesc::DOMText* >( currentField );
// const XMLCh* value = textdata->getTextContent();
const XMLCh* value = currentElement->getTextContent();
// cout<<value<<endl;
text = XMLString::transcode(value);
}
if(XMLString::equals(xmlch_name,ATTR_PageNo))
{
const XMLCh* value = currentElement->getTextContent();
m_OptionA = XMLString::transcode(value);
page = atoi(m_OptionA);
if(page==pageno) flag=1;
else flag = 0;
}
if(XMLString::equals(xmlch_name,ATTR_Location)&&flag==1)
{
const XMLCh* value = currentElement->getTextContent();
cout<<value<<endl;
loc = XMLString::transcode(value);
cout<<loc<<endl;
}
}
}
}
}
catch( xercesc::XMLException& e )
{
char* message = xercesc::XMLString::transcode( e.getMessage() );
ostringstream errBuf;
errBuf << "Error parsing file: " << message << flush;
XMLString::release( &message );
}
}