Re: Review: Serializer API - the related failure.
My notice might be related to the subject. DOMSerializer serializes newly created Element and DocumentFragment for me, but refuses the root element, throwing java.lang.ClassCastException: org.apache.xerces.dom.TextImpl at org.apache.xml.serialize.XMLSerializer.serializeElement(XMLSerializer.java:577) at org.apache.xml.serialize.BaseMarkupSerializer.serializeNode(BaseMarkupSerializer.java:827) at org.apache.xml.serialize.XMLSerializer.serializeElement(XMLSerializer.java:608) at org.apache.xml.serialize.BaseMarkupSerializer.serializeNode(BaseMarkupSerializer.java:827) at org.apache.xml.serialize.BaseMarkupSerializer.serialize(BaseMarkupSerializer.java:373) at com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:397) Is this a bug or my misuse of the API? -- Andy Clark wrote: First, I'd like to look at what's currently in the API and then discuss some points of design that I'd like to see in the serializers. DOMSerializer: I'm sort of surprised that there are methods to serialize a Document, Element, and DocumentFragment but nothing for a generic Node. In fact, if you wanted to serialize a text node or entity reference, you would first have to remove or clone it into a DocumentFragment and serialize that. And it is impossible to serialize things like attributes outside of their container elements. Would it be enough to have the following method? public void serialize(Node node) throws IOException; Method: I don't see a need for this class. If all it's doing is holding string constants, then I would say get rid of it completely. Otherwise, make it a Java "enumeration", like so: public class Method implements Serializable { // Constants public static final Method XML = new Method("text/xml"); public static final Method HTML = new Method("text/html"); public static final Method Text = new Method("text/plain"); // Data private String type; // Constructors protected Method(String type) { this.type = type; } // Object methods: equals, hashCode, and toString } But I think that we could do without it altogether and just make it possible to register new methods with the serializer factory. But I'll get to that in a minute. And the type of the method could be the mime type which would avoid the need of a set/getMediaType on the OutputFormat object. And if this thing is really representing the mime type, perhaps it should be called such instead of "Method". It would tie in better with existing standards. OutputFormat: It seems like a good idea to have a kind of properties object like OutputFormat. But it seems that the OutputFormat (and in fact the whole serializer API) is based on serializing to a text markup syntax. This sort of jumps the gun on what I'd like to say in general about the serialization API so I won't go any further at this point. Check out my comments below regarding this matter. Serializer: I noticed that this design makes use of the SAX interfaces but not of the traversal APIs added with DOM Level 2. Is there a way that we could leverage those interfaces? SerializerFactory: There's no way to dynamically register OutputMethods or Serializers. I think that there should be a way to do this. And overall, I'm not sure if we'd be allowed to drop stuff into the org.xml package namespace. Arkin: have you checked on this? And will any of this be superceded by DOM Level 3? at least on the DOM serialization side, that is... Perhaps Arnaud or someone else on the W3C commitee can shed light on this. Okay, now I'd like to make a few comments about what I'd like to see in a serialization API. First, I don't strictly see serialization as an output to some text markup. As such, I would like a split between binary and character serializers. Currently, there are both setOutputStream() and setWriter() methods on the Serializer objects. If possible, I'd like setOutputStream() only be on binary serializers and setWriter() be used on character serializers. All of the current serializer implementations (XML, HTML, XHTML) would be character serializers and the OutputFormat object seems to go very well with this. On the binary side, however, I can see a situation where SVG gets serialized to a JPEG image. I realize that this overlaps XSL Formatting Objects, though. Perhaps a better example would be an XML serializer that outputs to WBXML. Is anyone else thinking along these lines? -- Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED] -- Boris Garbuzov. Mailing address: Box 715, Seattle, Washington, 98111-0715, USA. E-mail: [EMAIL PROTECTED], [EMAIL PROTECTED] Telephone: 1(206)781-5165 (home), 1(206)576-4549 (office). Resedential address: 139 NW 104 Street, Seattle, 98177, Wa, USA
Re: Review: Serializer API
Here's a good tutorial on how to use the serialiser: http://metalab.unc.edu/xml/slides/sd2000west/xmlandjava/185.html Boris Garbuzov wrote: Even if I misuse the API (I should not call this directly?) it should have failed friendlier than this:
Re: Deep copy of node from other document
The method you are looking for is Document.importNode(). -- Andy - Original Message - From: Martin Duerig [EMAIL PROTECTED] Sorry for asking novice questions: What is the best strategy to deep copy a node from one document to another one?
Re: createDocument factory methods
The new API for document creation in DOM level 2 is not quite as succinct as the old static createDocument() method from the Xerces-c DOM level 1 implementation, but it's not quite as bad as all that. Here's some sample code, taken from the CreateDOMDocument sample included with Xerces 1.1, DOM_DOMImplementation impl; DOM_Document doc = impl.createDocument( 0,// root element namespace URI. company,// root element name DOM_DocumentType()); // document type object (DTD). The old method will stay around for a while, for compatibility purposes, but any new code should use the new DOM_DOMImplementation::createDocument() function. -- Andy - Original Message - From: Blaine Brodie [EMAIL PROTECTED] In the xerces-c DOM_Document interface, I noticed that there was a static factory method called createDocument. The comment with this function states that it was added because the DOM API lacks a mechanism for the creation of new documents. However the DOM2 spec. now has a createDocument function in the DOM_DOMImplementation interface which takes in a name space URI, qualified name and document type parameters, but is not static. To use this new createDocument function without a reference to an existing DOMImplementation, one must use something similar to the following: DOM_Document d = DOM_Document::createDocument().getDOMImplementation().createDocument(nsURI, qn, dt); This seems unintuitive. Are there any plans to either make the new createDocument static and/or move the old DOM_Document factory method over to DOM_DOMImplementation?
Re: Review: Serializer API
Mr. Garbuzov, you just found a bug. In XMLSerializer public void endElement( String namespaceURI, String localName, String rawName ) { ElementState state; // Works much like content() with additions for closing // an element. Note the different checks for the closed // element's state and the parent element's state. unindent(); state = getElementState(); if ( state.empty ) { In this method getElementState() call may return a null so trying to access state.empty would cause a nullpointer exception To fix this bug we need to check if state is null first. Thanks, Jeffrey Rodriguez XML4J Support IBM Cupertino From: Boris Garbuzov [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Review: Serializer API Date: Tue, 21 Mar 2000 16:57:05 -0800 String unexistingName = unexistingName; documentHandler.endElement (unexistingName); Even if I misuse the API (I should not call this directly?) it should have failed friendlier than this: java.lang.NullPointerException: at org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:307) at org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:421) at com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:442) __ Get Your Private, Free Email at http://www.hotmail.com
Re: Review: Serializer API
Andy Clark wrote: DOMSerializer: I'm sort of surprised that there are methods to serialize a Document, Element, and DocumentFragment but nothing for a generic Node. In fact, if you wanted to serialize a text node or entity reference, you would first have to remove or clone it into a DocumentFragment and serialize that. And it is impossible to serialize things like attributes outside of their container elements. Would it be enough to have the following method? public void serialize(Node node) throws IOException; Yes, I agree with Andy. a public void serialize(Node node) throws IOException; method may be all we need to support all the Node objects including Document, Element, DocumentFragment, etc. Look at the way we implement the dom.DOMWriter public void print(Node node) method. Method: I don't see a need for this class. If all it's doing is holding string constants, then I would say get rid of it completely. Otherwise, make it a Java enumeration, like so: public class Method implements Serializable { // Constants public static final Method XML = new Method(text/xml); public static final Method HTML = new Method(text/html); public static final Method Text = new Method(text/plain); // Data private String type; // Constructors protected Method(String type) { this.type = type; } // Object methods: equals, hashCode, and toString } But I think that we could do without it altogether and just make it possible to register new methods with the serializer factory. But I'll get to that in a minute. I will vote for doing without it if we can. And the type of the method could be the mime type which would avoid the need of a set/getMediaType on the OutputFormat object. And if this thing is really representing the mime type, perhaps it should be called such instead of Method. It would tie in better with existing standards. Yes, I think we should rename this class to MimeType and we should use the mime types as suggested by Andy. Serializer: I noticed that this design makes use of the SAX interfaces but not of the traversal APIs added with DOM Level 2. Is there a way that we could leverage those interfaces? I think that this idea has a lot of potential maybe using the TreeWalker method and allowing a way to register a filter then the serializer could be customized to output user selected nodes. A user can select and serialize a view of the DOM this way. SerializerFactory: There's no way to dynamically register OutputMethods or Serializers. I think that there should be a way to do this. I agree, I think that the SerializerFactory could allow dynamic registration of OutputMethods and/or Serializers. And overall, I'm not sure if we'd be allowed to drop stuff into the org.xml package namespace. Arkin: have you checked Yes, we need to figure this out. I think the safest way would be to move this package to the org.apache.xerces namespace Andy, by the way this package is in the org.Apache.xml not the org.xml ( Am i missing something here? ). Okay, now I'd like to make a few comments about what I'd like to see in a serialization API. First, I don't strictly see serialization as an output to some text markup. As such, I would like a split between binary and character serializers. Currently, there are both setOutputStream() and setWriter() methods on the Serializer objects. If possible, I'd like setOutputStream() only be on binary serializers and setWriter() be used on character serializers. I agree, with Andy too. I think that we should be able to have binary and character serializers. All of the current serializer implementations (XML, HTML, XHTML) would be character serializers and the OutputFormat object seems to go very well with this. On the binary side, however, I can see a situation where SVG gets serialized to a JPEG image. I realize that this overlaps XSL Formatting Arghh!! excuse me!! SVG getting serialized to a JPEG image is quite an interesting idea!. What is next, smil files to native realaudio files? Objects, though. Perhaps a better example would be an XML serializer that outputs to WBXML. Yes, this is a good idea and I would like so see it implemented. WBXML (binary XML) has a lot of potential specially for embedded devices, cellular phone applications, PDA, etc. Jeffrey Rodriguez XML Development IBM Cupertino __ Get Your Private, Free Email at http://www.hotmail.com
Re: [Fwd: Help!]
[EMAIL PROTECTED] wrote: Did the larger XML document work before? My guess is that you need to increase the default heap size for java.exe. The default heap size is 16MB for JKD1.1.8. So, with 90,000 lines of XML you'll suck up a big portion of that heap. Try increasing the heap size using the -mx parameter, e.g. for a 64MB heap: java -mx6400 org.apache.xalan.xslt.Process -in keeper.xml -xsl keeperhtml.xsl -Rob Andrea SchneiderTo: [EMAIL PROTECTED] [EMAIL PROTECTED]cc: (bcc: Robert Weir/CAM/Lotus) Subject: [Fwd: Help!] Sent by: [EMAIL PROTECTED] 03/21/00 11:34 AM Please respond to xerces-dev Hello Rob, thank you for soon answer. I tried it and first it seems to be the right way, but when I increased the heap until 128MB I reached a border. I want ( in this first test ) to write 100 invoices. The border is reached on 68 invoices. Is there another possibility to decrease memory requirements ? -Andrea
Re: [Xerces-C] Patches for OS/2 port
[EMAIL PROTECTED] wrote: Thanks Bill. We'll try to get those in next week. So, uh... Are those patches going to make it in? --Bill
Strange way to handle white spaces during parsing
Hi, We have noticed that, with an XML file such as: A B C Bono /C /B /A if one asks how many childs A tags owns (using getLength() on getDocumentElement().getChildNodes() ), the answer is 3. After investigation, whitespaces between A and B, then between /B and /A are interpreted as text nodes. On the same example, the XML parser included in IE 5.0 ignores these whitespaces and the answer is 1. We think that 1 is the correct answer. I really wonder: Isn't it weird? Is it the normal behavior? Is there a means to disable this, and have an IE-5.0-like behaviour? Is it an actual compliance to XML and DOM specs? Regards, jean-christophe broudin.
Re: Status of COM wrapper (was xerces for Delphi)
On Wed, 22 Mar 2000 09:52:17 -0500, [EMAIL PROTECTED] wrote: IBM recently donated a COM wrapper of Xerces to Apache. It looks like the source are checked into CVS, but no binary distribution has yet been released. Other than that, I don't know the status. -Rob Thank's Franz-Leo
Re: Strange way to handle white spaces during parsing
/ Jean-Christophe Broudin [EMAIL PROTECTED] was heard to say: | We have noticed that, with an XML file such as: | | A | B | C Bono /C | /B | /A | | if one asks how many childs A tags owns (using getLength() on | getDocumentElement().getChildNodes() ), the answer is | 3. After investigation, whitespaces between A and B, then between | /B and /A are interpreted as text nodes. | | On the same example, the XML parser included in IE 5.0 ignores these | whitespaces and the answer is 1. We think that 1 is the correct answer. The correct answer is three.
Fwd: Strange way to handle white spaces during parsing
I really wonder: Isn't it weird? No, it is not. Read section 2.10 of the XML 1.0 Recommendation. Is it the normal behavior? Yes, it is. Is there a means to disable this, and have an IE-5.0- like behaviour? Yes, set http://apache.org/xml/features/dom/include-ignorable-whitespace; feature to false ( the default is true thus the white space included ). Is it an actual compliance to XML and DOM specs? Are you asking are we compliant to XML and DOM specs. Yes we are. Again read the spec. Regards, jean-christophe broudin. Thanks, Jeffrey Rodriguez XML Development IBM Cupertino __ Get Your Private, Free Email at http://www.hotmail.com
Re: Strange way to handle white spaces during parsing
/ Juergen Hermann [EMAIL PROTECTED] was heard to say: | Is there a means to disable this, and have an IE-5.0-like behaviour? | | Yes. Write and use a DTD, so the parser knows that A does not contain mixed | content. Note that this does not mean that you need to use the validating | parser. The Xerces non-validating parser flags ignorable whitespace even when it's not validating? How does it know? :-) Be seeing you, norm -- Norman Walsh [EMAIL PROTECTED] | It is seldom that any liberty is http://nwalsh.com/ | lost all at once.--David Hume
[Xerces-J] Serializable Documents
I noticed that DocumentImpl implements Serializable (as does NodeImpl), but several other classes (like DocumentTypeImpl) don't. I'm trying to send a DocumentImpl back form an EJB server, and obviously it's not working. I'm assuming that the preferred method of doing this is just to use the serializers. I was hoping to avoid reconstructing the DOM tree. Any thoughts? Kito D. Mann Virtua Communications Corp.
Bug in tutorial
Thanks to those who advised me a tutorial http://metalab.unc.edu/xml/slides/sd2000west/xmlandjava/189.html. Just the code lines BigInteger low = BigInteger.ZERO; BigInteger high = BigInteger.ONE; need to be changed into BigInteger low = new BigInteger (0); BigInteger high = new BigInteger (1);
RE: SVG goes to DOM
Steven Coffman wrote: If SVG is going to be DOM based, rather than treated as a special case form of XSL:FO, then that immediately says, Xerces to me. Should an implementation of the W3C's SVG DOM be part of Xerces? Does that allow us to do anything cool? Should it continue to be part of FOP? If so, how can we be consistant with the Xerces stuff? -- I don't think that there should be any SVG specific code in the Xerces DOM implementation. That said, if you were able to create a SVGDOMDocument that could override the constructors of NodeImpl et al you might be able to allow anyone to create extended DOM's without cutting and pasting the existing DOM code. On kind of a related and not fully fleshed out thought, the xml-dev list recently had a discussion on pros and cons of SVG's use of long attributes for path data. One of the key performance issues that swayed the decision for the attribute form was that if the path was described as elements, the DOM grew so large that memory use and performance were unacceptible. In effect, the switch from XML representation to a microparsed representation was a way to trick the DOM not to fully expand the document to objects. The SVGPathData interface is then an attempt to gain some of the lost functionality. Basically, that suggested to me that if you were able to hint that at a certain place in the document, a flyweight implementation of Node were used (child content was held as a single string, flyweight implementations of Node, Attribute were mapped onto the string on use), you could have the best of both worlds, keep the XML representation and DOM access while avoiding the memory bloat of fully expanding the tree.
Re: [Xerces-C] Patches for OS/2 port
Did you mail me the OS/2 platform/compiler files? If you did, then my tiny brain must have deleted that message by mistake since the last message I see from you in my inbox is the big diff message you sent. Can you send me the OS/2 files in a zip and I'll pass them along to Rahul to integrate. Sorry if you already sent them, I promise I'll get it right this time. Dean Roddey Software Weenie IBM Center for Java Technology - Silicon Valley [EMAIL PROTECTED] Bill Schindler [EMAIL PROTECTED] on 03/22/2000 01:48:35 AM Please respond to [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc: Subject: Re: [Xerces-C] Patches for OS/2 port [EMAIL PROTECTED] wrote: Thanks Bill. We'll try to get those in next week. So, uh... Are those patches going to make it in? --Bill
Re: [Xerces-C] Patches for OS/2 port
Bill, Can you just send your OS/2 files to Jeffrey? He has an OS/2 development machine setup and can test out your changes and he has commit access. So mail them to: [EMAIL PROTECTED] If you've already mailed them to me, that's ok. I'll just vector them to him. But if you see this message first, just send them on to Jeffrey. Dean Roddey Software Weenie IBM Center for Java Technology - Silicon Valley [EMAIL PROTECTED]
Re: change of Node.getOwnerDocument()
Yes. This was changed to be conformant to the spec. Boris Garbuzov wrote: I am playing with Xerces API as if I unit-tested it on a basic level: applying every available method in a way so that just it does not fail. And I noticed some change of Node.getOwnerDocument() after downloading 1.0.3. Now it returned null for the node = document whereas it had returned a proper document in earlier version. Seems, this later one is according to specification that always existed? It used to be a bug? -- person name=Ralf I. Pfeiffer loc=IBM JTC, Cupertino, CA email1=[EMAIL PROTECTED] email2=[EMAIL PROTECTED] /
RE: Generate XML from DTD
Sangita Gupta wrote: I wish to write a java code which can generate an xml document based on a given dtd. Walk through the dtd, know which tags (name) to include in the xml document, grab the value froma string and generate the document. Is it possible? Any example will be greatly appreciated. I have been researching this also and there would seem to be a lot of demand since lots of corporate data is in flat files. I have basically seen three approaches: 1. Create a java application that reads in your DTD, reads in your data (a line at a time) and then creates the internal DOM structure using those two sources. Once you have the internal DOM built, you can serialize the XML as an output stream. Advantages - everything is done in one program Disadvantages - fairly complex and dependent upon a specific implementation of parser (IBM xml4j version 2 API) Take a look at XML and Java Building Web Applications by Maruyama, Tamura and Uramoto...chapter 3. 2. (courtesy of Doug Tidwell) a simpler approach would be a Java application that reads in your data and simply creates XML tags associated with each column. Somthing like this, ?xml version=1.0? document row column 1x/column1 column 2x/column2 . /row more rows /document then, you use xalan and stylesheets to render this XML format into a format suitable for your particular application, where you do conversion of columns to actual meaningful tags: ... employees employee sex=F serial number00/serial number name first_namejohn/first_name last_namedoe/last_name /name . the last step is to run it against the parser to validate against your DTD Advantage - much simpler and parser independence Disadvantage - more programs/steps/coordination required 3. buy from a vendor (I am researching this but have not found much). Tom Watson
Re: Review: Serializer API
How about changing Method not to MimeType but rather to ContentType? A construct I use *very* often in my HTML files is the following: meta http-equiv='content-type' contents='text/html; charset=x-sjis' Which would correspond to an equivalent response line from a web server or a header line in an email message: Content-Type: text/html; charset=x-sjis Then the ContentType could be the one used to specify the encoding of the output stream. Which brings me back to my binary vs. character serializer comment... Clearly, in Java, OutputStream is for binary output and Writer is for character output. However, what's implied is that the program has constructed the appropriate writer that converts the Unicode characters to the appropriate byte sequences in that encoding. So a Writer object is really writing to an OutputStream. But by changing character serializers to only support Writer, it would appear that we're putting the onus on the programmer to both specify the charset for the encoding type (so that any reference to the charset in the output is correct -- e.g. the XML encoding names, IANA, are *not* the same as the Java encoding names) *and* create an output writer with the appropriate Java encoding! However, I think that we can work through this by providing a convenience mapping that creates the appropriate writer from a given output stream. Here's a quick example: public class ContentType { // Data protected String type; protected String charset; // Constructors public ContentType(String type, String charset) { this.type = type; this.charset = charset; } // Public methods public Writer createWriter(OutputStream out) throws UnsupportedEncodingException { String javaEncoding = /* do mapping on charset */; return new OutputStreamWriter(out, javaEncoding); } // ... etc ... } What do you think? Or maybe a static method would be better? Hmmm... I'm just brainstorming here. I'd really like to hear other people's opinions. (I noticed that Sun's JavaMail extension has something similar: javax.mail.internet.ContentType) -- Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]
Re: Generate XML from DTD
Thanks Tom. This gives me something to chew on. I will research it further and keep you posted with the latest. Sangita Watson, Tom wrote: I have been researching this also and there would seem to be a lot of demand since lots of corporate data is in flat files. I have basically seen three approaches: 1. Create a java application that reads in your DTD, reads in your data (a line at a time) and then creates the internal DOM structure using those two sources. Once you have the internal DOM built, you can serialize the XML as an output stream. Advantages - everything is done in one program Disadvantages - fairly complex and dependent upon a specific implementation of parser (IBM xml4j version 2 API) Take a look at XML and Java Building Web Applications by Maruyama, Tamura and Uramoto...chapter 3. 2. (courtesy of Doug Tidwell) a simpler approach would be a Java application that reads in your data and simply creates XML tags associated with each column. Somthing like this, ?xml version=1.0? document row column 1x/column1 column 2x/column2 . /row more rows /document then, you use xalan and stylesheets to render this XML format into a format suitable for your particular application, where you do conversion of columns to actual meaningful tags: ... employees employee sex=F serial number00/serial number name first_namejohn/first_name last_namedoe/last_name /name . the last step is to run it against the parser to validate against your DTD Advantage - much simpler and parser independence Disadvantage - more programs/steps/coordination required 3. buy from a vendor (I am researching this but have not found much). Tom Watson
Re: Strange way to handle white spaces during parsing
/ [EMAIL PROTECTED] was heard to say: | If a DTD is present, its read and the information required to make this | decision is present. It doesn't require validation, just a check to see | what type of content model the element has. I'm not comfortable with that answer at all. I think an option that ignores element whitespace in a non-validating parse is non-standard and potentially dangerous. Consider: The XML 1.0 REC, Section 2.10: An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content. I can't think of any way to interpret that such that a non-validating parse could ignore whitespace. Consider the following example: !DOCTYPE test [ !ELEMENT a (b+) !ELEMENT b (#PCDATA) ] atestb/ b/this! 4 or 5?/a Does a have four children or five? The answer has to be five. And what about a document with an external subset that has parameter entities that cannot be located, so that the DTD is really half a loaf. Does it ignore whitespace in content models that it found, but not in others? Be seeing you, norm -- Norman Walsh [EMAIL PROTECTED] | As a general rule, the most http://nwalsh.com/ | successful man in life is the man | who has the best | information.--Benjamin Disraeli
Re: Xerces bug: base URI and external parsed entities
Norman Walsh wrote: Is it possible to access the URI of the document currently being parsed from XMLDTDScanner.scanEntityDecl()? I think you can query the XMLEntityHandler for this information. XMLDTDScanner has a reference to it. -- Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]
Re: Strange way to handle white spaces during parsing
Norman Walsh wrote: !DOCTYPE test [ !ELEMENT a (b+) !ELEMENT b (#PCDATA) ] atestb/ b/this! 4 or 5?/a !ELEMENT a (#PCDATA|b)* !ELEMENT b EMPTY -- Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]
Re: Strange way to handle white spaces during parsing
/ Andy Clark [EMAIL PROTECTED] was heard to say: | Norman Walsh wrote: | !DOCTYPE test [ | !ELEMENT a (b+) | !ELEMENT b (#PCDATA) | ] | atestb/ b/this! 4 or 5?/a | | !ELEMENT a (#PCDATA|b)* | !ELEMENT b EMPTY Huh? I guess I wasn't clear. I explicitly constructed a document that was well-formed but not valid. My question comments had to do with ignorable whitespace in a non-validating parse. The non-validating parser might think that a had element content and imagine that it could throw whitespace away. But it would be wrong. Be seeing you, norm -- Norman Walsh [EMAIL PROTECTED] | The shortness of life can neither http://nwalsh.com/ | dissuade us from its pleasures, | nor console us for its | pains.--Vauvenargues
Re: [Xerces-J] Serializable Documents
[EMAIL PROTECTED] wrote: I noticed that DocumentImpl implements Serializable (as does NodeImpl), but several other classes (like DocumentTypeImpl) don't. I'm trying to send a DocumentImpl back form an EJB server, and obviously it's not working. I'm assuming that the preferred method of doing this is just to use the serializers. I was hoping to avoid reconstructing the DOM tree. Any thoughts? I don't understand this one because I am able to serialize a document (the standard Serializable way) and read it back in again w/o errors. Take a look at the sample program I've attached to this message. What is your EJB server doing differently about serializing these objects? Anyway, I think it's more efficient to write the document to XML form, serialize *that*, and reparse it on the other end than to use Java serialization of Objects. The standard Serializable format on the wire is huge. -- Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED] Serializable.java Description: application/unknown-content-type-java_auto_file
Re: Review: Serializer API
Andy Clark wrote: First, I'd like to look at what's currently in the API and then discuss some points of design that I'd like to see in the serializers. DOMSerializer: I'm sort of surprised that there are methods to serialize a Document, Element, and DocumentFragment but nothing for a generic Node. In fact, if you wanted to serialize a text node or entity reference, you would first have to remove or clone it into a DocumentFragment and serialize that. And it is impossible to serialize things like attributes outside of their container elements. Would it be enough to have the following method? public void serialize(Node node) throws IOException; I tried to stick to the W3C model which defines a document or a document fragment, so if you want to print just an element, I think it makes sense to use a document fragment. As for serializing Node that happens to be an Attribute, keep in mind that we're trying to define an API used by a lot of serializers. The question that should be raised is: would it be trivial for them to support it? Would a PDF serializer support that? But I think that we could do without it altogether and just make it possible to register new methods with the serializer factory. But I'll get to that in a minute. If there is an agreement on that, I'll just make Method (which is designed to hold the default output method names, nothing more) part of the helpers class or kill it. I think it makes sense for documentation the common methods, see comments below, it's not essential for anything to work. And the type of the method could be the mime type which would avoid the need of a set/getMediaType on the OutputFormat object. And if this thing is really representing the mime type, perhaps it should be called such instead of Method. It would tie in better with existing standards. XSLT defines an output method which has one of three names xml, html, text or a qualified name for additional methods (like PDF, SVG, etc).It then defines media-type as a separate value. I don't like it, but it's part of the spec and the serializers have to support that for the sake of XSLT processing. To select a serializer you use the method name. Generally serializers do not care about the media type, but if we have a Servlet getting an XSLT response, it would probably want to use the media type as the content type. This is why getOutputFormat() exists, to extract the output format and determine the media type. The default output formats (and more can be supported) are defined in the helpers class, all of which provide values for both method and media type. In addition, the factory allows one to get an output format suitable for a given output method, so you can determine the media type. Not the best design, I agree, but one which follows the XSLT specs. OutputFormat: It seems like a good idea to have a kind of properties object like OutputFormat. But it seems that the OutputFormat (and in fact the whole serializer API) is based on serializing to a text markup syntax. This sort of jumps the gun on what I'd like to say in general about the serialization API so I won't go any further at this point. Check out my comments below regarding this matter. No, the serializer API does not assume markup, it was designed to support PDF, JPEG, and other binary formats. An implementation should by default support the three common text formats, but the API is designed so other formats can be introduced as well. Once again, if you read the XSLT spec it clearly defines xml, html and text, does not define, but allows, other output methods. I followed the same guidelines in coming up with this API. Serializer: I noticed that this design makes use of the SAX interfaces but not of the traversal APIs added with DOM Level 2. Is there a way that we could leverage those interfaces? Would make sense to support traversal for the DOMSerializer. What would be the API requirements for that (other than serializer(iterator))? SerializerFactory: There's no way to dynamically register OutputMethods or Serializers. I think that there should be a way to do this. By definition the SerializerFactory is one way - but not the only way - of obtaining serializers. You can also construct them directly. So no need to go overboard with over generalizing it. For registering serializers, I actually had a method for it, but I had to pull it off and rethink it, since it would work better if it registers both a serializer and a default OutputFormat. I would definitely like to see a registration mechanism in the final API. And overall, I'm not sure if we'd be allowed to drop stuff into the org.xml package namespace. Arkin: have you checked on this? And will any of this be superceded by DOM Level 3? at least on the DOM serialization side, that is... Perhaps Arnaud or someone else on the W3C commitee can shed light on this. We are not yet dropping anything. There are two proposals, the Serializer
Re: Review: Serializer API
I could not understand, were you just sending an endElement without a stateElement? (I can't check the line number right now, I have a newer copy on my machine that hopefully fixes this bug.) arkin Boris Garbuzov wrote: String unexistingName = unexistingName; documentHandler.endElement (unexistingName); Even if I misuse the API (I should not call this directly?) it should have failed friendlier than this: java.lang.NullPointerException: at org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:307) at org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:421) at com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:442) -- -- Assaf Arkin www.exoffice.com CTO, Exoffice Technologies, Inc.www.exolab.org
Upcoming Re: Review: Serializer API
Sorry for not being available lately, we have two major releases scheduled for the O'Reilly conference. No time to breath. The following is scheduled for release RSN: The serializers have been revised to include some bug fixes, performance improvement and preliminary support for encodings. They have also been brought up to speed with the proposed API. A WMLSerializer will be introduced along with a WML DOM (contributed by David Li). Minor bug fixes to the HTML DOM, and a version of the HTML parser for testing purposes. arkin -- -- Assaf Arkin www.exoffice.com CTO, Exoffice Technologies, Inc.www.exolab.org
PATCH: Re: Xerces bug: base URI and external parsed entities
The following patch seems to fix the relative URI bug. If (one of) the Xerces maintainers deems it worthy, please check it in :-) Index: XMLDTDScanner.java === RCS file: /home/cvspublic/xml-xerces/java/src/org/apache/xerces/framework/XMLDTD Scanner.java,v retrieving revision 1.4 diff -r1.4 XMLDTDScanner.java 1200a1201,1219 // [EMAIL PROTECTED] // // An fSystemLiteral value from an entity declaration may be // a relative URI. If so, it's important that we make it // absolute with respect to the context of the document that // we are currently reading. If we don't, the XMLParser will // make it absolute with respect to the point of *reference*, // before attempting to read it. That's definitely wrong. // String litSystemId = fStringPool.toString(fSystemLiteral); String absSystemId = fEntityHandler.expandSystemId(litSystemId); if (!absSystemId.equals(litSystemId)) { // REVISIT - Is it kosher to touch fStringPool directly? // Is there a better way? fEntityReader doesn't seem to // have an addString method that takes a literal string. fSystemLiteral = fStringPool.addString(absSystemId); } 2376a2396 Be seeing you, norm -- Norman Walsh [EMAIL PROTECTED] | Nothing ever gets anywhere. The http://nwalsh.com/ | earth keeps turning round and gets | nowhere. The moment is the only | thing that counts.--Jean Cocteau
Re: [Xerces-J] Serializable Documents
Anyway, I think it's more efficient to write the document to XML form, serialize *that*, and reparse it on the other end than to use Java serialization of Objects. The standard Serializable format on the wire is huge. And not as efficient as people would think due to the use of reflection. arkin -- Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED] Name: Serializable.java Serializable.javaType: application/x-unknown-content-type-java_auto_file Encoding: base64 -- -- Assaf Arkin www.exoffice.com CTO, Exoffice Technologies, Inc.www.exolab.org
Re: Review: Serializer API
That's the explanation. arkin Jeffrey Rodriguez wrote: Mr. Garbuzov, you just found a bug. In XMLSerializer public void endElement( String namespaceURI, String localName, String rawName ) { ElementState state; // Works much like content() with additions for closing // an element. Note the different checks for the closed // element's state and the parent element's state. unindent(); state = getElementState(); if ( state.empty ) { In this method getElementState() call may return a null so trying to access state.empty would cause a nullpointer exception To fix this bug we need to check if state is null first. Thanks, Jeffrey Rodriguez XML4J Support IBM Cupertino From: Boris Garbuzov [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Review: Serializer API Date: Tue, 21 Mar 2000 16:57:05 -0800 String unexistingName = unexistingName; documentHandler.endElement (unexistingName); Even if I misuse the API (I should not call this directly?) it should have failed friendlier than this: java.lang.NullPointerException: at org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:307) at org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:421) at com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:442) __ Get Your Private, Free Email at http://www.hotmail.com -- -- Assaf Arkin www.exoffice.com CTO, Exoffice Technologies, Inc.www.exolab.org
Re: Strange way to handle white spaces during parsing
Norman Walsh wrote: Huh? I guess I wasn't clear. I explicitly constructed a document that was well-formed but not valid. My question comments had to do with ignorable whitespace in a non-validating parse. Gotcha. In non-validating case (with or without a DTD), all character content is significant. (I'd have to verify the with DTD case, though.) -- Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]
Re: Strange way to handle white spaces during parsing
Sorry, my mind wasn't in gear. I was pointing out how it could work, but should have pointed out that we of course do the right thing (at least the C++ parser) and always call characters() if we are not validating. Dean Roddey Software Weenie IBM Center for Java Technology - Silicon Valley [EMAIL PROTECTED] Norman Walsh [EMAIL PROTECTED] on 03/22/2000 11:26:02 AM Please respond to [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc: Subject: Re: Strange way to handle white spaces during parsing / [EMAIL PROTECTED] was heard to say: | If a DTD is present, its read and the information required to make this | decision is present. It doesn't require validation, just a check to see | what type of content model the element has. I'm not comfortable with that answer at all. I think an option that ignores element whitespace in a non-validating parse is non-standard and potentially dangerous. Consider: The XML 1.0 REC, Section 2.10: An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content. I can't think of any way to interpret that such that a non-validating parse could ignore whitespace. Consider the following example: !DOCTYPE test [ !ELEMENT a (b+) !ELEMENT b (#PCDATA) ] atestb/ b/this! 4 or 5?/a Does a have four children or five? The answer has to be five. And what about a document with an external subset that has parameter entities that cannot be located, so that the DTD is really half a loaf. Does it ignore whitespace in content models that it found, but not in others? Be seeing you, norm -- Norman Walsh [EMAIL PROTECTED] | As a general rule, the most http://nwalsh.com/ | successful man in life is the man | who has the best | information.--Benjamin Disraeli
Configurable missing?
Hi all, It seems that org.xml.sax.Configurable was removed from xerces.jar starting with 1.0.2. Was this intentional, and if so, where should I find it? Bryn Keller Senior Software Engineer Jenkon International [EMAIL PROTECTED]
RE: SVG goes to DOM
This was a FOP message, but you're the DOM experts, so I'd like to get your input. The end result we want is that Scalable Vector Graphics (SVG) be translated to PDF. Kieron's been treating SVG sort of as a special case of XSL:FO, which is why it's been [uncommitted, but still] in FOP. If SVG is going to be DOM based, rather than treated as a special case form of XSL:FO, then that immediately says, Xerces to me. Should an implementation of the W3C's SVG DOM be part of Xerces? Does that allow us to do anything cool? Should it continue to be part of FOP? If so, how can we be consistant with the Xerces stuff? -Steve -Original Message- From: Keiron Liddle [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2000 7:08 AM To: fop-dev@xml.apache.org Subject: SVG I've had a look at the SVG dom classes. I will be moving all the svg code into this model once I figure out a few things. This means some major restructuring. This raises some questions. should the implementation be placed in org.apache.svg.dom.* should all implementation classes be called interface nameImpl.java Is it possible to have property makers (ie. propertyTable.put(width,SVGLengthProperty.maker())) that applies only to a particular xml element or maybe xml namespace. If not then there will be some problems with properties in svg and fop that have the same name but need to return different objects (without making Property bloated). Also in svg the text element can have a list of x values that should parse the property into a list, other x values should only be a single number. COFFMAN Steven wrote: In PDF, SVG, XSL, etc. we're flinging RGB floating point color components around. I'd like to fling a color object around instead. The SVG color object will be the implementation of the org.w3c.dom.svg.SVGColor which holds an RGBColor
static library for unix build?
Are you going to add creation of a static library for Xerces C in some future release? I would rather not use the shared library. Thanks in advance. Dean Hoover
Re: static library for unix build?
No, we do not support a static configuration, nor do we really have any plans to. You are free to do it yourself, but we will not have any officially supported static configuration. Dean Roddey Software Weenie IBM Center for Java Technology - Silicon Valley [EMAIL PROTECTED] Dean Hoover [EMAIL PROTECTED] on 03/22/2000 03:07:59 PM Please respond to [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc: Subject: static library for unix build? Are you going to add creation of a static library for Xerces C in some future release? I would rather not use the shared library. Thanks in advance. Dean Hoover
Re: How to finish/close a document
DTD/Schema access, caching, and re-validation is being investigated. These issues are also on the table for DOM Level 3 discussion. Currently, your method of writing XML and re-parsing - however convoluted - is the easiest. Regards, -Ralf
DIfference between Xerces 1.1 and IBM XML4C
Hello, Being fairly new to XML, I was wondering if anyone cam answer the following questions. 1. What is the difference between Xerces 1.1.0 and IBM XML4C 3.1 ? 2. Would you recommend the use of Xerces in mission critical, real-time applications ? 3. What type of support will we have with Xerces 1.1.0 ? Are there commercial support for Xerces ? Thanks, Tri Phan SWIFT