Re: [oXygen-user] Xerces command line parsing?

George Cristian Bina Tue, 17 Oct 2006 08:18:57 -0700

Dear Andrew,

You guessed correctly, specifying the parser class when you perform thetransformation makes the XSLT engine use that parser but this does notturn on validation, thus you only get a wellformed check.

Xerces does not have a command line utility. However, the Xerces samplescontain a number of example classes that can be invoked from commandline. See

http://xerces.apache.org/xerces2-j/samples.html
For instance you can use the sax.Counter sample:
http://xerces.apache.org/xerces2-j/samples-sax.html#Counter

Note that you need to download a Xerces distribution to get also thesamples jar that needs to be in the classpath together with thexercesImpl.jar and xml-apis.jar.

A caveat here is that you cannot enable the catalog support from theavailable command line options.


Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com


Andrew Rouner wrote:

Hello,

I am looking for the right syntax and method to be able to batch-parse XML
files from the command line using Xerces.  I need to use Xerces as I am
attempting to replicate parsing using oXygen (which has Xerces as its
default parser).  If anyone can send along the syntax for doing this or can
point me to a resource that can help, I'd very much appreciate it.

I previously used xmllint/LIBXML to do command line parsing of my TEI files,
which worked well for files calling on the TEI xlite DTD.  I am now dealing
with files that use the full TEI and must rely on the xml catalog, i.e.:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN" "tei2.dtd" [
<!ENTITY % TEI.XML 'INCLUDE'>
<!ENTITY % TEI.mixed 'INCLUDE'>
<!ENTITY % TEI.drama 'INCLUDE'>
<!ENTITY % TEI.corpus 'INCLUDE'>
<!ENTITY % TEI.prose 'INCLUDE'>
<!ENTITY % TEI.figures 'INCLUDE'>
<!ENTITY % TEI.linking 'INCLUDE'>
<!ENTITY % TEI.transcr 'INCLUDE'>
<!ENTITY % TEI.names.dates 'INCLUDE'>
<!ENTITY % TEI.spoken 'INCLUDE'>
<!ENTITY % TEI.header 'INCLUDE'>
<!ENTITY % ISOlat1 SYSTEM
'http://www.tei-c.org/Entity_Sets/Unicode/iso-lat1.ent'> %ISOlat1;
<!ENTITY % ISOlat2 SYSTEM
'http://www.tei-c.org/Entity_Sets/Unicode/iso-lat2.ent'> %ISOlat2;
<!ENTITY % ISOnum SYSTEM
'http://www.tei-c.org/Entity_Sets/Unicode/iso-num.ent'> %ISOnum;
<!ENTITY % ISOpub SYSTEM
'http://www.tei-c.org/Entity_Sets/Unicode/iso-pub.ent'> %ISOpub;
]>

I need to use Xerces, because I find that the default parser in oXygen
(which is Xerces) can successfully parse these files (and LIBXML does not
work for files using the full TEI due to problems with the DTD).

My best understanding (which may be completely off) is that to use Xerces as
an XML parser in the command line, what I am essentially doing, is using the
syntax to run an XML file through an XSL stylesheet (on the assumption that
the source file has to validate to run successfully.

I have modified a previous stylesheet that processes all TEI elements found
in these documents, and I use this syntax:

java com.icl.saxon.StyleSheet -x org.apache.xerces.parsers.SAXParser
source_file.xml stylesheet.xsl > /dev/null

I am using Xerces as it comes with oXygen (and have not downloaded it
separately).  Since I am only really interested in parsing and not the
output, I pipe it to /dev/null.  I have the following in my bash profile for
the PATH:

CLASSPATH=$CLASSPATH:/Applications/oxygen/lib/saxon.jar:\
/Applications/oxygen/frameworks/docbook/xsl/extensions/saxon653.jar.ext:/App
lications/oxygen/lib/xercesImpl.jar
export CLASSPATH

The above command WORKS, and will pick up SOME errors, but is clearly
missing others.  Does anyone have any more straightforward syntax for just
PARSING with Xerces, or have any ideas why some errors (I have tested) are
not being reported through this process?  (One possibility is that it's just
checking well-formedness, not validity, which I need to test further.)

Thanks in advance for any help/suggestions.

Andrew

Andrew Rouner
Digital Library Services
Washington University Libraries
St. Louis, MO

EMAIL:  [EMAIL PROTECTED]

From: Oxygen XML Editor support <[EMAIL PROTECTED]>
Date: Tue, 25 Jul 2006 12:47:23 +0300
To: Andrew Rouner <[EMAIL PROTECTED]>
Subject: Re: Differences in validators/ dtd problems?

Dear Andrew Rouner,

Thank you for contacting us.
The default parser used by oXygen is Xerces 2.8.0 (that is the latest
Xerces version). This looks at a first glance like a problem/bug in XMLLINT.
If you want to invoke Xerces to parse a document from command line then
you can do that though one of its sample applications:
http://xerces.apache.org/xerces2-j/samples.html

Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com


_______________________________________________
oXygen-user mailing list
[email protected]
http://www.oxygenxml.com/mailman/listinfo/oxygen-user

_______________________________________________
oXygen-user mailing list
[email protected]
http://www.oxygenxml.com/mailman/listinfo/oxygen-user

Re: [oXygen-user] Xerces command line parsing?

Reply via email to