David Bertoni 写道:
caox wrote:
David Bertoni 写道:
caox wrote:
Alberto Massari 写道:
Have a look at the MemParse sample.

Alberto

caox wrote:
Hi
   I am using the xercesc3.0.1 for xml parsing.
How can I check whether the input source from a byte steam is well-formed xml format? Since I want my program to raise an exception when it receives files other than xml.
   Appreciate your help.


Thanks a lot. I have tried the sample DOMPrint which could throw an exception as expect when encountered a bad-formed xml file. But when I use xqilla to create a DOMParser, it seems can accept all kinds of files. The code is below:

AutoRelease<DOMLSParser> parser(xqillaImplementation->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS, 0));

And this puzzled me a lot.
Did you create a custom ErrorHandler instance and install it in the parser?

Dave


I didn't.But how to install a ErrorHandler instance in a DOMLSParser instance. I don't find the setXXX() method for this.
OK, you actually need a DOMErrorHandler, rather than an ErrorHandler.

The DOMPrint sample application has an example of setting the correct DOMConfiguration property. To adapt it to your use, just call DOMLSParser::getDomConfig() and the set the property:

    DOMErrorHandler* myErrorHandler = new myDOMErrorHandler();
    DOMConfiguration* config = parser->getDomConfig();
    config->setParameter(XMLUni::fgDOMErrorHandler, myErrorHandler);

Dave


I have modified the code in DOMPrint sample to use DOMLSParserImpl instead of XercesDOMParser for testing, like below:

   DOMLSParserImpl* parser = new DOMLSParserImpl;
   DOMConfiguration* domConfig = parser->getDomConfig();

   DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
   domConfig->setParameter(XMLUni::fgDOMErrorHandler, errReporter);

And the result is quite different from using XercesDOMParser.

   XercesDOMParser *parser = new XercesDOMParser;

   DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
   parser->setErrorHandler(errReporter);

The code has been attached. Could you please find where the problem is?
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/*
 * $Id: DOMTreeErrorReporter.cpp 471735 2006-11-06 13:53:58Z amassari $
 */

// ---------------------------------------------------------------------------
//  Includes
// ---------------------------------------------------------------------------
#include <xercesc/sax/SAXParseException.hpp>
#include "DOMTreeErrorReporter.hpp"
#if defined(XERCES_NEW_IOSTREAMS)
#include <iostream>
#else
#include <iostream.h>
#endif
#include <stdlib.h>
#include <string.h>


void DOMTreeErrorReporter::warning(const SAXParseException&)
{
    //
    // Ignore all warnings.
    //
}

void DOMTreeErrorReporter::error(const SAXParseException& toCatch)
{
    fSawErrors = true;
    XERCES_STD_QUALIFIER cerr << "Error at file \"" << 
StrX(toCatch.getSystemId())
                 << "\", line " << toCatch.getLineNumber()
                 << ", column " << toCatch.getColumnNumber()
         << "\n   Message: " << StrX(toCatch.getMessage()) << 
XERCES_STD_QUALIFIER endl;
}

void DOMTreeErrorReporter::fatalError(const SAXParseException& toCatch)
{
    fSawErrors = true;
    XERCES_STD_QUALIFIER cerr << "Fatal Error at file \"" << 
StrX(toCatch.getSystemId())
                 << "\", line " << toCatch.getLineNumber()
                 << ", column " << toCatch.getColumnNumber()
         << "\n   Message: " << StrX(toCatch.getMessage()) << 
XERCES_STD_QUALIFIER endl;
}

void DOMTreeErrorReporter::resetErrors()
{
    fSawErrors = false;
}


/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/*
 * $Id: DOMTreeErrorReporter.hpp 471735 2006-11-06 13:53:58Z amassari $
 */

#include <xercesc/util/XercesDefs.hpp>
#include <xercesc/sax/ErrorHandler.hpp>
#if defined(XERCES_NEW_IOSTREAMS)
#include <iostream>
#else
#include <iostream.h>
#endif


XERCES_CPP_NAMESPACE_USE


class DOMTreeErrorReporter : public ErrorHandler
{
public:
    // -----------------------------------------------------------------------
    //  Constructors and Destructor
    // -----------------------------------------------------------------------
    DOMTreeErrorReporter() :
       fSawErrors(false)
    {
    }

    ~DOMTreeErrorReporter()
    {
    }


    // -----------------------------------------------------------------------
    //  Implementation of the error handler interface
    // -----------------------------------------------------------------------
    void warning(const SAXParseException& toCatch);
    void error(const SAXParseException& toCatch);
    void fatalError(const SAXParseException& toCatch);
    void resetErrors();

    // -----------------------------------------------------------------------
    //  Getter methods
    // -----------------------------------------------------------------------
    bool getSawErrors() const;

    // -----------------------------------------------------------------------
    //  Private data members
    //
    //  fSawErrors
    //      This is set if we get any errors, and is queryable via a getter
    //      method. Its used by the main code to suppress output if there are
    //      errors.
    // -----------------------------------------------------------------------
    bool    fSawErrors;
};

inline bool DOMTreeErrorReporter::getSawErrors() const
{
    return fSawErrors;
}

// ---------------------------------------------------------------------------
//  This is a simple class that lets us do easy (though not terribly efficient)
//  trancoding of XMLCh data to local code page for display.
// ---------------------------------------------------------------------------
class StrX
{
public :
    // -----------------------------------------------------------------------
    //  Constructors and Destructor
    // -----------------------------------------------------------------------
    StrX(const XMLCh* const toTranscode)
    {
        // Call the private transcoding method
        fLocalForm = XMLString::transcode(toTranscode);
    }

    ~StrX()
    {
        XMLString::release(&fLocalForm);
    }


    // -----------------------------------------------------------------------
    //  Getter methods
    // -----------------------------------------------------------------------
    const char* localForm() const
    {
        return fLocalForm;
    }

private :
    // -----------------------------------------------------------------------
    //  Private data members
    //
    //  fLocalForm
    //      This is the local code page form of the string.
    // -----------------------------------------------------------------------
    char*   fLocalForm;
};

inline XERCES_STD_QUALIFIER ostream& operator<<(XERCES_STD_QUALIFIER ostream& 
target, const StrX& toDump)
{
    target << toDump.localForm();
    return target;
}

<?xml version="1.0" encoding="utf-8"?>
<!-- @version: -->

<personnel>
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/*
 * $Id: DOMPrint.cpp 669844 2008-06-20 10:11:44Z borisk $
 */

// ---------------------------------------------------------------------------
//  This sample program invokes the XercesDOMParser to build a DOM tree for
//  the specified input file. It then invokes DOMLSSerializer::write() to
//  serialize the resultant DOM tree back to XML stream.
//
//  Note:
//  Application needs to provide its own implementation of
//                 DOMErrorHandler (in this sample, the DOMPrintErrorHandler),
//                 if it would like to receive notification from the serializer
//                 in the case any error occurs during the serialization.
//
//  Application needs to provide its own implementation of
//                 DOMLSSerializerFilter (in this sample, the DOMPrintFilter),
//                 if it would like to filter out certain part of the DOM
//                 representation, but must be aware that thus may render the
//                 resultant XML stream invalid.
//
//  Application may choose any combination of characters as the
//                 end of line sequence to be used in the resultant XML stream,
//                 but must be aware that thus may render the resultant XML
//                 stream ill formed.
//
//  Application may choose a particular encoding name in which
//                 the output XML stream would be, but must be aware that if
//                 characters, unrepresentable in the encoding specified, 
appearing
//                 in markups, may force the serializer to terminate 
serialization
//                 prematurely, and thus no complete serialization would be 
done.
//
//  Application shall query the serializer first, before set any
//           feature/mode(true, false), or be ready to catch exception if this
//           feature/mode is not supported by the serializer.
//
//  Application needs to clean up the filter, error handler and
//                 format target objects created for the serialization.
//
//   Limitations:
//      1.  The encoding="xxx" clause in the XML header should reflect
//          the system local code page, but does not.
//      2.  Cases where the XML data contains characters that can not
//          be represented in the system local code page are not handled.
//
// ---------------------------------------------------------------------------


// ---------------------------------------------------------------------------
//  Includes
// ---------------------------------------------------------------------------
#include <xercesc/util/PlatformUtils.hpp>

#include <xercesc/dom/DOM.hpp>

#include <xercesc/framework/StdOutFormatTarget.hpp>
#include <xercesc/framework/LocalFileFormatTarget.hpp>
#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/parsers/DOMLSParserImpl.hpp>
#include <xercesc/util/XMLUni.hpp>

#include "DOMTreeErrorReporter.hpp"
#include <xercesc/util/OutOfMemoryException.hpp>

#include <string.h>
#include <stdlib.h>

// ---------------------------------------------------------------------------
//  Local data
//
//  gXmlFile
//      The path to the file to parser. Set via command line.
//
//  gDoNamespaces
//      Indicates whether namespace processing should be done.
//
//  gDoSchema
//      Indicates whether schema processing should be done.
//
//  gSchemaFullChecking
//      Indicates whether full schema constraint checking should be done.
//
//  gDoCreate
//      Indicates whether entity reference nodes needs to be created or not
//      Defaults to false
//
//  gOutputEncoding
//      The encoding we are to output in. If not set on the command line,
//      then it is defaults to the encoding of the input XML file.
//
//  gSplitCdataSections
//      Indicates whether split-cdata-sections is to be enabled or not.
//
//  gDiscardDefaultContent
//      Indicates whether default content is discarded or not.
//
//  gUseFilter
//      Indicates if user wants to plug in the DOMPrintFilter.
//
//  gValScheme
//      Indicates what validation scheme to use. It defaults to 'auto', but
//      can be set via the -v= command.
//
// ---------------------------------------------------------------------------

// ---------------------------------------------------------------------------
//
//  main
//
// ---------------------------------------------------------------------------
static bool gDoNamespaces = false;
static bool gDoSchema = false;
static bool gSchemaFullChecking = false;
static bool gDoCreate = false;

static char* goutputfile = 0;
static char* gXPathExpression = 0;

// options for DOMLSSerializer's features
static XMLCh* gOutputEncoding = 0;

static bool gSplitCdataSections = true;
static bool gDiscardDefaultContent = true;
static bool gUseFilter = false;
static bool gFormatPrettyPrint = false;
static bool gWriteBOM = false;

int main(int argC, char* argV[]) {
    int retval = 0;
    const char* gXmlFile = "personal.xml";
    static XMLCh* gOutputEncoding = 0;
    // Initialize the XML4C2 system
    try {
        XMLPlatformUtils::Initialize();
    } catch (const XMLException &toCatch) {
        XERCES_STD_QUALIFIER cerr << "Error during Xerces-c Initialization.\n"
                << "  Exception message:"
                << StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl;
        return 1;
    }


    //
    //  Create our parser, then attach an error handler to the parser.
    //  The parser will call back to methods of the ErrorHandler if it
    //  discovers errors during the course of parsing the XML document.
    //

    DOMLSParserImpl* parser = new DOMLSParserImpl;
    DOMConfiguration* domConfig = parser->getDomConfig();

    DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
    domConfig->setParameter(XMLUni::fgDOMErrorHandler, errReporter);

    //
    //  Parse the XML file, catching any XML exceptions that might propogate
    //  out of it.
    //
    bool errorsOccured = false;
    try {
        parser->parseURI(gXmlFile);
    } catch (const OutOfMemoryException&) {
        XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << 
XERCES_STD_QUALIFIER endl;
        errorsOccured = true;
    } catch (const XMLException& e) {
        XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n   
Message: "
                << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
        errorsOccured = true;
    } catch (const DOMException& e) {
        const unsigned int maxChars = 2047;
        XMLCh errText[maxChars + 1];

        XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << 
gXmlFile << "'\n"
                << "DOMException code is:  " << e.code << XERCES_STD_QUALIFIER 
endl;

        if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, maxChars))
            XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) << 
XERCES_STD_QUALIFIER endl;

        errorsOccured = true;
    } catch (...) {
        XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\n " << 
XERCES_STD_QUALIFIER endl;
        errorsOccured = true;
    }

    // If the parse was successful, output the document data from the DOM tree
    if (!errorsOccured && !errReporter->getSawErrors()) {

        try {
            // get a serializer, an instance of DOMLSSerializer
            XMLCh tempStr[3] = {chLatin_L, chLatin_S, chNull};
            DOMImplementation *impl = 
DOMImplementationRegistry::getDOMImplementation(tempStr);
            DOMLSSerializer *theSerializer = ((DOMImplementationLS*) 
impl)->createLSSerializer();
            DOMLSOutput *theOutputDesc = ((DOMImplementationLS*) 
impl)->createLSOutput();

            // set user specified output encoding
            theOutputDesc->setEncoding(gOutputEncoding);


            // plug in user's own error handler
            //            DOMErrorHandler *myErrorHandler = new 
DOMPrintErrorHandler();
            //            DOMConfiguration* serializerConfig = 
theSerializer->getDomConfig();
            //            
serializerConfig->setParameter(XMLUni::fgDOMErrorHandler, myErrorHandler);

            // set feature if the serializer supports the feature/mode
            //            if 
(serializerConfig->canSetParameter(XMLUni::fgDOMWRTSplitCdataSections, 
gSplitCdataSections))
            //                
serializerConfig->setParameter(XMLUni::fgDOMWRTSplitCdataSections, 
gSplitCdataSections);
            //
            //            if 
(serializerConfig->canSetParameter(XMLUni::fgDOMWRTDiscardDefaultContent, 
gDiscardDefaultContent))
            //                
serializerConfig->setParameter(XMLUni::fgDOMWRTDiscardDefaultContent, 
gDiscardDefaultContent);
            //
            //            if 
(serializerConfig->canSetParameter(XMLUni::fgDOMWRTFormatPrettyPrint, 
gFormatPrettyPrint))
            //                
serializerConfig->setParameter(XMLUni::fgDOMWRTFormatPrettyPrint, 
gFormatPrettyPrint);
            //
            //            if 
(serializerConfig->canSetParameter(XMLUni::fgDOMWRTBOM, gWriteBOM))
            //                
serializerConfig->setParameter(XMLUni::fgDOMWRTBOM, gWriteBOM);

            //
            // Plug in a format target to receive the resultant
            // XML stream from the serializer.
            //
            // StdOutFormatTarget prints the resultant XML stream
            // to stdout once it receives any thing from the serializer.
            //
            XMLFormatTarget *myFormTarget;
            if (goutputfile)
                myFormTarget = new LocalFileFormatTarget(goutputfile);
            else
                myFormTarget = new StdOutFormatTarget();
            theOutputDesc->setByteStream(myFormTarget);

            // get the DOM representation
            DOMDocument *doc = parser->getDocument();

            //
            // do the serialization through DOMLSSerializer::write();
            //
            if (gXPathExpression != NULL) {
                XMLCh* xpathStr = XMLString::transcode(gXPathExpression);
                DOMElement* root = doc->getDocumentElement();
                try {
                    DOMXPathNSResolver* resolver = doc->createNSResolver(root);
                    DOMXPathResult* result = doc->evaluate(
                            xpathStr,
                            root,
                            resolver,
                            DOMXPathResult::ORDERED_NODE_SNAPSHOT_TYPE,
                            NULL);

                    XMLSize_t nLength = result->getSnapshotLength();
                    for (XMLSize_t i = 0; i < nLength; i++) {
                        result->snapshotItem(i);
                        theSerializer->write(result->getNodeValue(), 
theOutputDesc);
                    }

                    result->release();
                    resolver->release();
                } catch (const DOMXPathException& e) {
                    XERCES_STD_QUALIFIER cerr << "An error occurred during 
processing of the XPath expression. Msg is:"
                            << XERCES_STD_QUALIFIER endl
                            << StrX(e.getMessage()) << XERCES_STD_QUALIFIER 
endl;
                    retval = 4;
                } catch (const DOMException& e) {
                    XERCES_STD_QUALIFIER cerr << "An error occurred during 
processing of the XPath expression. Msg is:"
                            << XERCES_STD_QUALIFIER endl
                            << StrX(e.getMessage()) << XERCES_STD_QUALIFIER 
endl;
                    retval = 4;
                }
                XMLString::release(&xpathStr);
            } else
                theSerializer->write(doc, theOutputDesc);

            theOutputDesc->release();
            theSerializer->release();

            //
            // Filter, formatTarget and error handler
            // are NOT owned by the serializer.
            //
            delete myFormTarget;

        } catch (const OutOfMemoryException&) {
            XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << 
XERCES_STD_QUALIFIER endl;
            retval = 5;
        } catch (XMLException& e) {
            XERCES_STD_QUALIFIER cerr << "An error occurred during creation of 
output transcoder. Msg is:"
                    << XERCES_STD_QUALIFIER endl
                    << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl;
            retval = 4;
        }

    } else
        retval = 4;

    //
    //  Clean up the error handler. The parser does not adopt handlers
    //  since they could be many objects or one object installed for multiple
    //  handlers.
    //
    delete errReporter;

    //
    //  Delete the parser itself.  Must be done prior to calling Terminate, 
below.
    //
    delete parser;

    XMLString::release(&gOutputEncoding);

    // And call the termination method
    XMLPlatformUtils::Terminate();

    return retval;
}

Reply via email to