David Bertoni 写道:
caox wrote:
David Bertoni 写道:
caox wrote:
Alberto Massari 写道:
Have a look at the MemParse sample.
Alberto
caox wrote:
Hi
I am using the xercesc3.0.1 for xml parsing.
How can I check whether the input source from a byte steam is
well-formed xml format? Since I want my program to raise an
exception when it receives files other than xml.
Appreciate your help.
Thanks a lot. I have tried the sample DOMPrint which could throw an
exception as expect when encountered a bad-formed xml file.
But when I use xqilla to create a DOMParser, it seems can accept
all kinds of files. The code is below:
AutoRelease<DOMLSParser>
parser(xqillaImplementation->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS,
0));
And this puzzled me a lot.
Did you create a custom ErrorHandler instance and install it in the
parser?
Dave
I didn't.But how to install a ErrorHandler instance in a DOMLSParser
instance. I don't find the setXXX() method for this.
OK, you actually need a DOMErrorHandler, rather than an ErrorHandler.
The DOMPrint sample application has an example of setting the correct
DOMConfiguration property. To adapt it to your use, just call
DOMLSParser::getDomConfig() and the set the property:
DOMErrorHandler* myErrorHandler = new myDOMErrorHandler();
DOMConfiguration* config = parser->getDomConfig();
config->setParameter(XMLUni::fgDOMErrorHandler, myErrorHandler);
Dave
I have modified the code in DOMPrint sample to use DOMLSParserImpl
instead of XercesDOMParser for testing, like below:
DOMLSParserImpl* parser = new DOMLSParserImpl;
DOMConfiguration* domConfig = parser->getDomConfig();
DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
domConfig->setParameter(XMLUni::fgDOMErrorHandler, errReporter);
And the result is quite different from using XercesDOMParser.
XercesDOMParser *parser = new XercesDOMParser;
DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
parser->setErrorHandler(errReporter);
The code has been attached. Could you please find where the problem is?
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Id: DOMTreeErrorReporter.cpp 471735 2006-11-06 13:53:58Z amassari $
*/
// ---------------------------------------------------------------------------
// Includes
// ---------------------------------------------------------------------------
#include <xercesc/sax/SAXParseException.hpp>
#include "DOMTreeErrorReporter.hpp"
#if defined(XERCES_NEW_IOSTREAMS)
#include <iostream>
#else
#include <iostream.h>
#endif
#include <stdlib.h>
#include <string.h>
void DOMTreeErrorReporter::warning(const SAXParseException&)
{
//
// Ignore all warnings.
//
}
void DOMTreeErrorReporter::error(const SAXParseException& toCatch)
{
fSawErrors = true;
XERCES_STD_QUALIFIER cerr << "Error at file \"" <<
StrX(toCatch.getSystemId())
<< "\", line " << toCatch.getLineNumber()
<< ", column " << toCatch.getColumnNumber()
<< "\n Message: " << StrX(toCatch.getMessage()) <<
XERCES_STD_QUALIFIER endl;
}
void DOMTreeErrorReporter::fatalError(const SAXParseException& toCatch)
{
fSawErrors = true;
XERCES_STD_QUALIFIER cerr << "Fatal Error at file \"" <<
StrX(toCatch.getSystemId())
<< "\", line " << toCatch.getLineNumber()
<< ", column " << toCatch.getColumnNumber()
<< "\n Message: " << StrX(toCatch.getMessage()) <<
XERCES_STD_QUALIFIER endl;
}
void DOMTreeErrorReporter::resetErrors()
{
fSawErrors = false;
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Id: DOMTreeErrorReporter.hpp 471735 2006-11-06 13:53:58Z amassari $
*/
#include <xercesc/util/XercesDefs.hpp>
#include <xercesc/sax/ErrorHandler.hpp>
#if defined(XERCES_NEW_IOSTREAMS)
#include <iostream>
#else
#include <iostream.h>
#endif
XERCES_CPP_NAMESPACE_USE
class DOMTreeErrorReporter : public ErrorHandler
{
public:
// -----------------------------------------------------------------------
// Constructors and Destructor
// -----------------------------------------------------------------------
DOMTreeErrorReporter() :
fSawErrors(false)
{
}
~DOMTreeErrorReporter()
{
}
// -----------------------------------------------------------------------
// Implementation of the error handler interface
// -----------------------------------------------------------------------
void warning(const SAXParseException& toCatch);
void error(const SAXParseException& toCatch);
void fatalError(const SAXParseException& toCatch);
void resetErrors();
// -----------------------------------------------------------------------
// Getter methods
// -----------------------------------------------------------------------
bool getSawErrors() const;
// -----------------------------------------------------------------------
// Private data members
//
// fSawErrors
// This is set if we get any errors, and is queryable via a getter
// method. Its used by the main code to suppress output if there are
// errors.
// -----------------------------------------------------------------------
bool fSawErrors;
};
inline bool DOMTreeErrorReporter::getSawErrors() const
{
return fSawErrors;
}
// ---------------------------------------------------------------------------
// This is a simple class that lets us do easy (though not terribly efficient)
// trancoding of XMLCh data to local code page for display.
// ---------------------------------------------------------------------------
class StrX
{
public :
// -----------------------------------------------------------------------
// Constructors and Destructor
// -----------------------------------------------------------------------
StrX(const XMLCh* const toTranscode)
{
// Call the private transcoding method
fLocalForm = XMLString::transcode(toTranscode);
}
~StrX()
{
XMLString::release(&fLocalForm);
}
// -----------------------------------------------------------------------
// Getter methods
// -----------------------------------------------------------------------
const char* localForm() const
{
return fLocalForm;
}
private :
// -----------------------------------------------------------------------
// Private data members
//
// fLocalForm
// This is the local code page form of the string.
// -----------------------------------------------------------------------
char* fLocalForm;
};
inline XERCES_STD_QUALIFIER ostream& operator<<(XERCES_STD_QUALIFIER ostream&
target, const StrX& toDump)
{
target << toDump.localForm();
return target;
}
<?xml version="1.0" encoding="utf-8"?>
<!-- @version: -->
<personnel>
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Id: DOMPrint.cpp 669844 2008-06-20 10:11:44Z borisk $
*/
// ---------------------------------------------------------------------------
// This sample program invokes the XercesDOMParser to build a DOM tree for
// the specified input file. It then invokes DOMLSSerializer::write() to
// serialize the resultant DOM tree back to XML stream.
//
// Note:
// Application needs to provide its own implementation of
// DOMErrorHandler (in this sample, the DOMPrintErrorHandler),
// if it would like to receive notification from the serializer
// in the case any error occurs during the serialization.
//
// Application needs to provide its own implementation of
// DOMLSSerializerFilter (in this sample, the DOMPrintFilter),
// if it would like to filter out certain part of the DOM
// representation, but must be aware that thus may render the
// resultant XML stream invalid.
//
// Application may choose any combination of characters as the
// end of line sequence to be used in the resultant XML stream,
// but must be aware that thus may render the resultant XML
// stream ill formed.
//
// Application may choose a particular encoding name in which
// the output XML stream would be, but must be aware that if
// characters, unrepresentable in the encoding specified,
appearing
// in markups, may force the serializer to terminate
serialization
// prematurely, and thus no complete serialization would be
done.
//
// Application shall query the serializer first, before set any
// feature/mode(true, false), or be ready to catch exception if this
// feature/mode is not supported by the serializer.
//
// Application needs to clean up the filter, error handler and
// format target objects created for the serialization.
//
// Limitations:
// 1. The encoding="xxx" clause in the XML header should reflect
// the system local code page, but does not.
// 2. Cases where the XML data contains characters that can not
// be represented in the system local code page are not handled.
//
// ---------------------------------------------------------------------------
// ---------------------------------------------------------------------------
// Includes
// ---------------------------------------------------------------------------
#include <xercesc/util/PlatformUtils.hpp>
#include <xercesc/dom/DOM.hpp>
#include <xercesc/framework/StdOutFormatTarget.hpp>
#include <xercesc/framework/LocalFileFormatTarget.hpp>
#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/parsers/DOMLSParserImpl.hpp>
#include <xercesc/util/XMLUni.hpp>
#include "DOMTreeErrorReporter.hpp"
#include <xercesc/util/OutOfMemoryException.hpp>
#include <string.h>
#include <stdlib.h>
// ---------------------------------------------------------------------------
// Local data
//
// gXmlFile
// The path to the file to parser. Set via command line.
//
// gDoNamespaces
// Indicates whether namespace processing should be done.
//
// gDoSchema
// Indicates whether schema processing should be done.
//
// gSchemaFullChecking
// Indicates whether full schema constraint checking should be done.
//
// gDoCreate
// Indicates whether entity reference nodes needs to be created or not
// Defaults to false
//
// gOutputEncoding
// The encoding we are to output in. If not set on the command line,
// then it is defaults to the encoding of the input XML file.
//
// gSplitCdataSections
// Indicates whether split-cdata-sections is to be enabled or not.
//
// gDiscardDefaultContent
// Indicates whether default content is discarded or not.
//
// gUseFilter
// Indicates if user wants to plug in the DOMPrintFilter.
//
// gValScheme
// Indicates what validation scheme to use. It defaults to 'auto', but
// can be set via the -v= command.
//
// ---------------------------------------------------------------------------
// ---------------------------------------------------------------------------
//
// main
//
// ---------------------------------------------------------------------------
static bool gDoNamespaces = false;
static bool gDoSchema = false;
static bool gSchemaFullChecking = false;
static bool gDoCreate = false;
static char* goutputfile = 0;
static char* gXPathExpression = 0;
// options for DOMLSSerializer's features
static XMLCh* gOutputEncoding = 0;
static bool gSplitCdataSections = true;
static bool gDiscardDefaultContent = true;
static bool gUseFilter = false;
static bool gFormatPrettyPrint = false;
static bool gWriteBOM = false;
int main(int argC, char* argV[]) {
int retval = 0;
const char* gXmlFile = "personal.xml";
static XMLCh* gOutputEncoding = 0;
// Initialize the XML4C2 system
try {
XMLPlatformUtils::Initialize();
} catch (const XMLException &toCatch) {
XERCES_STD_QUALIFIER cerr << "Error during Xerces-c Initialization.\n"
<< " Exception message:"
<< StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl;
return 1;
}
//
// Create our parser, then attach an error handler to the parser.
// The parser will call back to methods of the ErrorHandler if it
// discovers errors during the course of parsing the XML document.
//
DOMLSParserImpl* parser = new DOMLSParserImpl;
DOMConfiguration* domConfig = parser->getDomConfig();
DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
domConfig->setParameter(XMLUni::fgDOMErrorHandler, errReporter);
//
// Parse the XML file, catching any XML exceptions that might propogate
// out of it.
//
bool errorsOccured = false;
try {
parser->parseURI(gXmlFile);
} catch (const OutOfMemoryException&) {
XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
XERCES_STD_QUALIFIER endl;
errorsOccured = true;
} catch (const XMLException& e) {
XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n
Message: "
<< StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
errorsOccured = true;
} catch (const DOMException& e) {
const unsigned int maxChars = 2047;
XMLCh errText[maxChars + 1];
XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" <<
gXmlFile << "'\n"
<< "DOMException code is: " << e.code << XERCES_STD_QUALIFIER
endl;
if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, maxChars))
XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) <<
XERCES_STD_QUALIFIER endl;
errorsOccured = true;
} catch (...) {
XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n " <<
XERCES_STD_QUALIFIER endl;
errorsOccured = true;
}
// If the parse was successful, output the document data from the DOM tree
if (!errorsOccured && !errReporter->getSawErrors()) {
try {
// get a serializer, an instance of DOMLSSerializer
XMLCh tempStr[3] = {chLatin_L, chLatin_S, chNull};
DOMImplementation *impl =
DOMImplementationRegistry::getDOMImplementation(tempStr);
DOMLSSerializer *theSerializer = ((DOMImplementationLS*)
impl)->createLSSerializer();
DOMLSOutput *theOutputDesc = ((DOMImplementationLS*)
impl)->createLSOutput();
// set user specified output encoding
theOutputDesc->setEncoding(gOutputEncoding);
// plug in user's own error handler
// DOMErrorHandler *myErrorHandler = new
DOMPrintErrorHandler();
// DOMConfiguration* serializerConfig =
theSerializer->getDomConfig();
//
serializerConfig->setParameter(XMLUni::fgDOMErrorHandler, myErrorHandler);
// set feature if the serializer supports the feature/mode
// if
(serializerConfig->canSetParameter(XMLUni::fgDOMWRTSplitCdataSections,
gSplitCdataSections))
//
serializerConfig->setParameter(XMLUni::fgDOMWRTSplitCdataSections,
gSplitCdataSections);
//
// if
(serializerConfig->canSetParameter(XMLUni::fgDOMWRTDiscardDefaultContent,
gDiscardDefaultContent))
//
serializerConfig->setParameter(XMLUni::fgDOMWRTDiscardDefaultContent,
gDiscardDefaultContent);
//
// if
(serializerConfig->canSetParameter(XMLUni::fgDOMWRTFormatPrettyPrint,
gFormatPrettyPrint))
//
serializerConfig->setParameter(XMLUni::fgDOMWRTFormatPrettyPrint,
gFormatPrettyPrint);
//
// if
(serializerConfig->canSetParameter(XMLUni::fgDOMWRTBOM, gWriteBOM))
//
serializerConfig->setParameter(XMLUni::fgDOMWRTBOM, gWriteBOM);
//
// Plug in a format target to receive the resultant
// XML stream from the serializer.
//
// StdOutFormatTarget prints the resultant XML stream
// to stdout once it receives any thing from the serializer.
//
XMLFormatTarget *myFormTarget;
if (goutputfile)
myFormTarget = new LocalFileFormatTarget(goutputfile);
else
myFormTarget = new StdOutFormatTarget();
theOutputDesc->setByteStream(myFormTarget);
// get the DOM representation
DOMDocument *doc = parser->getDocument();
//
// do the serialization through DOMLSSerializer::write();
//
if (gXPathExpression != NULL) {
XMLCh* xpathStr = XMLString::transcode(gXPathExpression);
DOMElement* root = doc->getDocumentElement();
try {
DOMXPathNSResolver* resolver = doc->createNSResolver(root);
DOMXPathResult* result = doc->evaluate(
xpathStr,
root,
resolver,
DOMXPathResult::ORDERED_NODE_SNAPSHOT_TYPE,
NULL);
XMLSize_t nLength = result->getSnapshotLength();
for (XMLSize_t i = 0; i < nLength; i++) {
result->snapshotItem(i);
theSerializer->write(result->getNodeValue(),
theOutputDesc);
}
result->release();
resolver->release();
} catch (const DOMXPathException& e) {
XERCES_STD_QUALIFIER cerr << "An error occurred during
processing of the XPath expression. Msg is:"
<< XERCES_STD_QUALIFIER endl
<< StrX(e.getMessage()) << XERCES_STD_QUALIFIER
endl;
retval = 4;
} catch (const DOMException& e) {
XERCES_STD_QUALIFIER cerr << "An error occurred during
processing of the XPath expression. Msg is:"
<< XERCES_STD_QUALIFIER endl
<< StrX(e.getMessage()) << XERCES_STD_QUALIFIER
endl;
retval = 4;
}
XMLString::release(&xpathStr);
} else
theSerializer->write(doc, theOutputDesc);
theOutputDesc->release();
theSerializer->release();
//
// Filter, formatTarget and error handler
// are NOT owned by the serializer.
//
delete myFormTarget;
} catch (const OutOfMemoryException&) {
XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
XERCES_STD_QUALIFIER endl;
retval = 5;
} catch (XMLException& e) {
XERCES_STD_QUALIFIER cerr << "An error occurred during creation of
output transcoder. Msg is:"
<< XERCES_STD_QUALIFIER endl
<< StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
retval = 4;
}
} else
retval = 4;
//
// Clean up the error handler. The parser does not adopt handlers
// since they could be many objects or one object installed for multiple
// handlers.
//
delete errReporter;
//
// Delete the parser itself. Must be done prior to calling Terminate,
below.
//
delete parser;
XMLString::release(&gOutputEncoding);
// And call the termination method
XMLPlatformUtils::Terminate();
return retval;
}