Re: reg:[reading data with ZWJ and ZWNJ]

jinesh kj Wed, 28 Nov 2007 06:56:24 -0800

hi,

I actually need the whole text with the zwj. My code i am attaching. Only
the section which does interaction with xml file. Hope its enough. My code
is little big, so it may take a little time for you to understand i havent
commented it properly. If you need explanation on any part please let me
know.


cheers

Jinesh  K J

On Nov 28, 2007 5:43 PM, Alberto Massari <[EMAIL PROTECTED]> wrote:

> The file you attached is correct, and the same modified DOMPrint that I
> used before return the ZWJ characters in the content of getTextContent.
> Could you show us the code you are using to read the file?
>
> Alberto
>
> jinesh kj wrote:
> > hi,
> >
> > I dumped using mysql -X command which will give me output as xml file.
> > I dont know whether there is any problem with my xml files. Is there
> > any specific notation to represent the ZWJ and ZWNJ in xml files?
> >
> > I am attaching an xml file i have.
> >
> > Thank you for your help, and if you have a better idea what to do with
> > the xml file when i get characters like these, or any links to those
> > details, please point me.
> >
> > regards
> >
> > Jinesh K J
> >
> > On Nov 28, 2007 4:46 PM, Alberto Massari <[EMAIL PROTECTED]
> > <mailto:[EMAIL PROTECTED]>> wrote:
> >
> >     If you can read the original file, but not when you edit it, I
> >     would bet
> >     the reason is in the way you edit your XML files (and dump from the
> >     database). What are you using? Could you attach a small sample file?
> >
> >     Alberto
> >
> >     jinesh kj wrote:
> >     > hi,
> >     >
> >     > I tried reading the file you send. It didnt give any error,
> >     which means it
> >     > was reading perfectly. I dont know how to check  in the debugger
> >     and all, so
> >     > dont know whether it  read 200d or not. But if i try to edit the
> >     xml file,
> >     > with some text data along with, it is not reading the the text.
> >     Do i have to
> >     > do anything for it? Basically i am trying to read through an xml
> >     file, which
> >     > is a dump of mysql database. It have many zwj and all. I dont
> >     know whether
> >     > it is according to specified encoding or so and all.But since it
> >     was dumped
> >     > from database, using the built in function, i think a chance for
> >     error is
> >     > too low.
> >     >
> >     > I am trying to use a similar function only, in my program, it
> >     returns
> >     > nothing when there is a ZWJ in my data.
> >     >
> >     > I hope i am clear. I am able to read xml files without ZWJ easily.
> >     >
> >     > regards
> >     >
> >     > Jinesh K J
> >     >
> >     > On Nov 28, 2007 4:02 PM, Alberto Massari
> >     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
> >     >
> >     >
> >     >> I am attaching a sample XML that contains a U+200D character
> >     between a
> >     >> --| and |-- pattern; I modified DOMPrint to issue a
> >     >>
> >     >>            const XMLCh*
> >     data=doc->getDocumentElement()->getTextContent();
> >     >>
> >     >> and in the debugger I see that data[4] is \x200D
> >     >> Have you checked your source XML  really has that character?
> >     Also, is
> >     >> the representation of the ZWJ character in the XML file valid
> >     according
> >     >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
> >     >>
> >     >> Alberto
> >     >>
> >     >> jinesh kj wrote:
> >     >>
> >     >>> hi,
> >     >>>
> >     >>> Actually, getTextContent is not returning any value when there
> >     is a Zero
> >     >>> width joiner.
> >     >>>
> >     >>> cheers
> >     >>>
> >     >>> Jinesh K J
> >     >>>
> >     >>> On Nov 28, 2007 3:28 PM, Alberto Massari
> >     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
> >     >>>
> >     >> wrote:
> >     >>
> >     >>>
> >     >>>> Hi Jinesh,
> >     >>>> which kind of issues are you having? The text returned by
> >     >>>>
> >     >> getTextContent
> >     >>
> >     >>>> should contain a \x200D value inside. Or have you transcoded
> >     it into
> >     >>>> chars?
> >     >>>>
> >     >>>> Alberto
> >     >>>>
> >     >>>> jinesh kj wrote:
> >     >>>>
> >     >>>>
> >     >>>>> hi all,
> >     >>>>>
> >     >>>>> I was trying to read from an XML file where some data have
> >     ZERO Width
> >     >>>>>
> >     >>>>>
> >     >>>> Joiner
> >     >>>>
> >     >>>>
> >     >>>>> in it. I used the getTextContent in DOMNode. I was able to
> >     read the
> >     >>>>>
> >     >>>>>
> >     >>>> contents
> >     >>>>
> >     >>>>
> >     >>>>> without Zero width joiner, but there are some issues with
> these
> >     >>>>>
> >     >> special
> >     >>
> >     >>>>> characters. What do i have to change? Do i have to make any
> >     special
> >     >>>>> settings? Or do i have to use any other function insttead?
> >     >>>>>
> >     >>>>> cheers
> >     >>>>> Jinesh K J
> >     >>>>>
> >     >>>>>
> >     >>>>>
> >     >>>>>
> >     >>>
> >     >>>
> >     >>
> >     >
> >     >
> >     >
> >
> >
> >
> >
> > --
> > My Feelings,Expressions-
> > http://logbookofanobserver.blogspot.com
> >
> > SMC : My computer, My language http://smc.org.in
> > സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>
>


-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

void GetConfig::readConfigFile(string& configFile)
        throw( std::runtime_error )
{
  int flag = 0; 
  // Test to see if the file is ok.

  struct stat fileStatus;

   int iretStat = stat(configFile.c_str(), &fileStatus);
   if( iretStat == ENOENT )
      throw ( std::runtime_error("Path file_name does not exist, or path is an empty string.") );
   else if( iretStat == ENOTDIR )
      throw ( std::runtime_error("A component of the path is not a directory."));
   else if( iretStat == ELOOP )
      throw ( std::runtime_error("Too many symbolic links encountered while traversing the path."));
   else if( iretStat == EACCES )
      throw ( std::runtime_error("Permission denied."));
   else if( iretStat == ENAMETOOLONG )
      throw ( std::runtime_error("File can not be read\n"));

	//Configure DOM parser

   m_ConfigFileParser->setValidationScheme( XercesDOMParser::Val_Never );
   m_ConfigFileParser->setDoNamespaces( false );
   m_ConfigFileParser->setDoSchema( false );
   m_ConfigFileParser->setLoadExternalDTD( false );

   try
   {
      m_ConfigFileParser->parse( configFile.c_str() );

      // no need to free this pointer - owned by the parent parser object
      DOMDocument* xmlDoc = m_ConfigFileParser->getDocument();

      // Get the top-level element.
      
      DOMElement* elementRoot = xmlDoc->getDocumentElement();
      if( !elementRoot ) throw(std::runtime_error( "empty XML document" ));

      //Get the children of root node.
      // Get size of the list so as to go through the list.
      DOMNodeList*      children = elementRoot->getChildNodes();
      const  XMLSize_t nodeCount = children->getLength();

	//Go through the list of children till end.(Actually parses each row).
      for( XMLSize_t xx = 0; xx < nodeCount; ++xx )
      {
		DOMNode* CurrentNode = children->item(xx);
		//Get children of the current node(get fields of each row).
		DOMNodeList* CurrentChildren = CurrentNode->getChildNodes();
		const XMLSize_t childCount = CurrentChildren->getLength();
		//Go through the list of fields of each row.
		for(XMLSize_t yy = 0; yy < childCount; ++yy)
		{
			DOMNode* currentField = CurrentChildren->item(yy);
			if( currentField->getNodeType() &&  // true is not NULL
             currentField->getNodeType() == DOMNode::ELEMENT_NODE ) // is element 
			{
				            // Found node which is an Element. Re-cast node as element
				            DOMElement* currentElement
                     					   = dynamic_cast< xercesc::DOMElement* >( currentField );
						//check the atrribute value 'name' and compare with required fields. If matches get the content by getTextContent()
					   const XMLCh* xmlch_name
					   	=currentElement->getAttribute(XMLString::transcode("name"));
						if(XMLString::equals(xmlch_name,ATTR_Block)&&flag==1)
						{		
							const XMLCh* value = currentElement->getTextContent();
							m_OptionA = XMLString::transcode(value);
							location.push_back(atoi(m_OptionA));
						}
						if(XMLString::equals(xmlch_name,ATTR_Line)&&flag==1)
						{		
							const XMLCh* value = currentElement->getTextContent();
							m_OptionA = XMLString::transcode(value);
							location.push_back(atoi(m_OptionA));
						}
						if(XMLString::equals(xmlch_name,ATTR_Word)&&flag==1)
						{		
							const XMLCh* value = currentElement->getTextContent();
							m_OptionA = XMLString::transcode(value);
							location.push_back(atoi(m_OptionA));
						}
						if(XMLString::equals(xmlch_name,ATTR_rectLeft)&&flag==1)
						{		
							const XMLCh* value = currentElement->getTextContent();
							m_OptionA = XMLString::transcode(value);
							bounds.push_back(atoi(m_OptionA));
						}
						if(XMLString::equals(xmlch_name,ATTR_rectRight)&&flag==1)
						{		
							const XMLCh* value = currentElement->getTextContent();
							m_OptionA = XMLString::transcode(value);
							bounds.push_back(atoi(m_OptionA));
						}
						if(XMLString::equals(xmlch_name,ATTR_rectTop)&&flag==1)
						{		
							const XMLCh* value = currentElement->getTextContent();
							m_OptionA = XMLString::transcode(value);
							bounds.push_back(atoi(m_OptionA));
						}
						if(XMLString::equals(xmlch_name,ATTR_rectBot)&&flag==1)					 
						{		
							const XMLCh* value = currentElement->getTextContent();
							m_OptionA = XMLString::transcode(value);
							bounds.push_back(atoi(m_OptionA));
						}
						if(XMLString::equals(xmlch_name,ATTR_BookCode))					 
						{		
							const XMLCh* value = currentElement->getTextContent();
							m_OptionA = XMLString::transcode(value);
							bookCode = atoi(m_OptionA);
							if(strcmp(m_OptionA,code)==0) flag=1;
							else flag =0;
						}
						if(XMLString::equals(xmlch_name,ATTR_Text)&&flag==1)					 
						{		
                     			//		DOMText* textdata = dynamic_cast< xercesc::DOMText* >( currentField );
					//		const XMLCh* value = textdata->getTextContent();
							const XMLCh* value = currentElement->getTextContent();
					//		cout<<value<<endl;
							text = XMLString::transcode(value);
						}
						if(XMLString::equals(xmlch_name,ATTR_PageNo))					 
						{		
							const XMLCh* value = currentElement->getTextContent();
							m_OptionA = XMLString::transcode(value);
							page = atoi(m_OptionA);
							if(page==pageno) flag=1;
							else flag = 0;
						}
						if(XMLString::equals(xmlch_name,ATTR_Location)&&flag==1)
						{		
							const XMLCh* value = currentElement->getTextContent();
							cout<<value<<endl;
							loc = XMLString::transcode(value);
							cout<<loc<<endl;
						}					 			
			}
		}
      }
   }
   catch( xercesc::XMLException& e )
   {
      char* message = xercesc::XMLString::transcode( e.getMessage() );
      ostringstream errBuf;
      errBuf << "Error parsing file: " << message << flush;
      XMLString::release( &message );
   }

}

Re: reg:[reading data with ZWJ and ZWNJ]

Reply via email to