Re: Review: Serializer API - the related failure.

2000-03-22 Thread Boris Garbuzov


My notice might be related to the subject. DOMSerializer serializes newly
created Element and DocumentFragment for me, but refuses the root element,
throwing
 java.lang.ClassCastException: org.apache.xerces.dom.TextImpl
 at org.apache.xml.serialize.XMLSerializer.serializeElement(XMLSerializer.java:577)
 at org.apache.xml.serialize.BaseMarkupSerializer.serializeNode(BaseMarkupSerializer.java:827)
 at org.apache.xml.serialize.XMLSerializer.serializeElement(XMLSerializer.java:608)
 at org.apache.xml.serialize.BaseMarkupSerializer.serializeNode(BaseMarkupSerializer.java:827)
 at org.apache.xml.serialize.BaseMarkupSerializer.serialize(BaseMarkupSerializer.java:373)
 at com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:397)
Is this a bug or my misuse of the API?
--

Andy Clark wrote:
First, I'd like to look at what's currently in the
API and then
discuss some points of design that I'd like to see in the
serializers.
DOMSerializer: I'm sort of surprised that there are methods to
serialize a Document, Element, and DocumentFragment but nothing
for a generic Node. In fact, if you wanted to serialize a text
node or entity reference, you would first have to remove or
clone it into a DocumentFragment and serialize that. And it is
impossible to serialize things like attributes outside of their
container elements. Would it be enough to have the following
method?
 public void serialize(Node node) throws IOException;
Method: I don't see a need for this class. If all it's doing
is holding string constants, then I would say get rid of it
completely. Otherwise, make it a Java "enumeration", like so:
 public class Method implements Serializable {
 // Constants
 public static final Method XML = new Method("text/xml");
 public static final Method HTML = new Method("text/html");
 public static final Method Text = new Method("text/plain");
 // Data
 private String type;
 // Constructors
 protected Method(String type) { this.type = type;
}
 // Object methods: equals, hashCode, and toString
 }
But I think that we could do without it altogether and just
make it possible to register new methods with the serializer
factory. But I'll get to that in a minute.
And the type of the method could be the mime type which would
avoid the need of a set/getMediaType on the OutputFormat object.
And if this thing is really representing the mime type, perhaps
it should be called such instead of "Method". It would tie in
better with existing standards.
OutputFormat: It seems like a good idea to have a kind of
properties object like OutputFormat. But it seems that the
OutputFormat (and in fact the whole serializer API) is based
on serializing to a text markup syntax. This sort of jumps
the gun on what I'd like to say in general about the
serialization API so I won't go any further at this point.
Check out my comments below regarding this matter.
Serializer: I noticed that this design makes use of the SAX
interfaces but not of the traversal APIs added with DOM Level
2. Is there a way that we could leverage those interfaces?
SerializerFactory: There's no way to dynamically register
OutputMethods or Serializers. I think that there should be
a way to do this.
And overall, I'm not sure if we'd be allowed to drop stuff
into the org.xml package namespace. Arkin: have you checked
on this? And will any of this be superceded by DOM Level 3?
at least on the DOM serialization side, that is... Perhaps
Arnaud or someone else on the W3C commitee can shed light
on this.
Okay, now I'd like to make a few comments about what I'd
like to see in a serialization API. First, I don't strictly
see serialization as an output to some text markup. As
such, I would like a split between binary and character
serializers. Currently, there are both setOutputStream()
and setWriter() methods on the Serializer objects. If
possible, I'd like setOutputStream() only be on binary
serializers and setWriter() be used on character serializers.
All of the current serializer implementations (XML, HTML,
XHTML) would be character serializers and the OutputFormat
object seems to go very well with this. On the binary side,
however, I can see a situation where SVG gets serialized to
a JPEG image. I realize that this overlaps XSL Formatting
Objects, though. Perhaps a better example would be an XML
serializer that outputs to WBXML.
Is anyone else thinking along these lines?
--
Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]

--
Boris Garbuzov.
Mailing address:
Box 715, Seattle, Washington, 98111-0715, USA.
E-mail: [EMAIL PROTECTED], [EMAIL PROTECTED]
Telephone: 1(206)781-5165 (home), 1(206)576-4549 (office).
Resedential address: 139 NW 104 Street, Seattle, 98177, Wa, USA



Re: Review: Serializer API

2000-03-22 Thread Wong Kok Wai
Here's a good tutorial on how to use the serialiser:
http://metalab.unc.edu/xml/slides/sd2000west/xmlandjava/185.html

Boris Garbuzov wrote:

 Even if I misuse the API (I should not call this directly?) it should have 
 failed friendlier than this:




Re: Deep copy of node from other document

2000-03-22 Thread Andy Heninger
The method you are looking for is Document.importNode().

  -- Andy


- Original Message -
From: Martin Duerig [EMAIL PROTECTED]

 Sorry for asking novice questions: What is the best strategy to deep copy
a
 node from one document to another one?





Re: createDocument factory methods

2000-03-22 Thread Andy Heninger
The new API for document creation in DOM level 2 is not quite as succinct as
the old static createDocument() method from the Xerces-c DOM level 1
implementation, but it's not quite as bad as all that.

Here's some sample code, taken from the CreateDOMDocument sample included
with Xerces 1.1,

DOM_DOMImplementation impl;
DOM_Document doc = impl.createDocument(
0,// root element namespace URI.
company,// root element name
DOM_DocumentType());  // document type object (DTD).


The old method will stay around for a while, for compatibility purposes, but
any new code should use the new DOM_DOMImplementation::createDocument()
function.


   -- Andy


- Original Message -
From: Blaine Brodie [EMAIL PROTECTED]

 In the xerces-c DOM_Document interface, I noticed that there was a static
 factory method called createDocument.  The comment with this function
 states that it was added because
 the DOM API lacks a mechanism for the creation of new documents.  However
 the DOM2 spec. now has a createDocument function in the
 DOM_DOMImplementation interface which takes in a name space URI, qualified
 name and document type parameters, but is not static.  To use this new
 createDocument function without a reference to an existing
 DOMImplementation, one must use something similar to the following:

 DOM_Document d =

DOM_Document::createDocument().getDOMImplementation().createDocument(nsURI,
 qn, dt);

 This seems unintuitive.  Are there any plans to either make the new
 createDocument static and/or move the old DOM_Document factory method over
 to DOM_DOMImplementation?





Re: Review: Serializer API

2000-03-22 Thread Jeffrey Rodriguez
Mr. Garbuzov, you just found a bug.
In XMLSerializer
public void endElement( String namespaceURI, String localName,
   String rawName )
   {
   ElementState state;
   // Works much like content() with additions for closing
   // an element. Note the different checks for the closed
   // element's state and the parent element's state.
   unindent();
   state = getElementState();
   if ( state.empty ) {
In this method getElementState() call may return a null
so  trying to access state.empty would  cause a nullpointer exception
To fix this bug we need to check if state is null first.
Thanks,
   Jeffrey Rodriguez
   XML4J Support
   IBM Cupertino


From: Boris Garbuzov [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Re: Review: Serializer API
Date: Tue, 21 Mar 2000 16:57:05 -0800
String unexistingName = unexistingName;
documentHandler.endElement (unexistingName);
Even if I misuse the API (I should not call this directly?) it should have 
failed friendlier than this:

java.lang.NullPointerException:
 at
org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:307)
 at
org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:421)
 at 
com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:442)


__
Get Your Private, Free Email at http://www.hotmail.com


Re: Review: Serializer API

2000-03-22 Thread Jeffrey Rodriguez
Andy Clark wrote:
DOMSerializer: I'm sort of surprised that there are methods to
serialize a Document, Element, and DocumentFragment but nothing
for a generic Node. In fact, if you wanted to serialize a text
node or entity reference, you would first have to remove or
clone it into a DocumentFragment and serialize that. And it is
impossible to serialize things like attributes outside of their
container elements. Would it be enough to have the following
method?
  public void serialize(Node node) throws IOException;
Yes, I agree with Andy. a
public void serialize(Node node) throws IOException;
method may be all we need to support all the Node objects including
Document, Element, DocumentFragment, etc.
Look at the way we implement the dom.DOMWriter
public void print(Node node) method.

Method: I don't see a need for this class. If all it's doing
is holding string constants, then I would say get rid of it
completely. Otherwise, make it a Java enumeration, like so:
  public class Method implements Serializable {
// Constants
public static final Method XML = new Method(text/xml);
public static final Method HTML = new Method(text/html);
public static final Method Text = new Method(text/plain);
// Data
private String type;
// Constructors
protected Method(String type) { this.type = type; }
// Object methods: equals, hashCode, and toString
  }
But I think that we could do without it altogether and just
make it possible to register new methods with the serializer
factory. But I'll get to that in a minute.
I will vote for doing without it if we can.

And the type of the method could be the mime type which would
avoid the need of a set/getMediaType on the OutputFormat object.
And if this thing is really representing the mime type, perhaps
it should be called such instead of Method. It would tie in
better with existing standards.
Yes, I think we should rename this class to MimeType and we should
use the mime types as suggested by Andy.

Serializer: I noticed that this design makes use of the SAX
interfaces but not of the traversal APIs added with DOM Level
2. Is there a way that we could leverage those interfaces?
I think that this idea has a lot of potential maybe using the TreeWalker 
method and allowing a way to register a filter then the
serializer could be customized  to output user selected nodes.
A user can select and serialize a view of the DOM this way.

SerializerFactory: There's no way to dynamically register
OutputMethods or Serializers. I think that there should be
a way to do this.
I agree, I think that the SerializerFactory could allow dynamic
registration of OutputMethods and/or Serializers.

And overall, I'm not sure if we'd be allowed to drop stuff
into the org.xml package namespace. Arkin: have you checked
Yes, we need to figure this out. I think the safest way would be
to move this package to the org.apache.xerces namespace
Andy, by the way  this package is in the org.Apache.xml not the
org.xml ( Am i missing something here? ).
Okay, now I'd like to make a few comments about what I'd
like to see in a serialization API. First, I don't strictly
see serialization as an output to some text markup. As
such, I would like a split between binary and character
serializers. Currently, there are both setOutputStream()
and setWriter() methods on the Serializer objects. If
possible, I'd like setOutputStream() only be on binary
serializers and setWriter() be used on character serializers.
I agree, with Andy too. I think that we should be able to have
binary and character serializers.

All of the current serializer implementations (XML, HTML,
XHTML) would be character serializers and the OutputFormat
object seems to go very well with this. On the binary side,
however, I can see a situation where SVG gets serialized to
a JPEG image. I realize that this overlaps XSL Formatting
Arghh!! excuse me!! SVG getting serialized to a JPEG image is quite
an interesting idea!. What is next, smil files to native realaudio
files?
Objects, though. Perhaps a better example would be an XML
serializer that outputs to WBXML.
Yes, this is a good idea and I would like so see it implemented. WBXML 
(binary XML) has a lot of potential specially for embedded devices, cellular 
phone applications, PDA, etc.

Jeffrey Rodriguez
XML Development
IBM Cupertino
__
Get Your Private, Free Email at http://www.hotmail.com


Re: [Fwd: Help!]

2000-03-22 Thread Andrea Schneider
[EMAIL PROTECTED] wrote:
 
 Did the larger XML document work before?  My guess is that you need to
 increase the default heap size for java.exe.  The default heap size is 16MB
 for JKD1.1.8.  So, with 90,000 lines of XML you'll suck up a big portion of
 that heap.
 
 Try increasing the heap size using the -mx parameter, e.g. for a 64MB
 heap:
 
 java -mx6400 org.apache.xalan.xslt.Process -in keeper.xml -xsl
 keeperhtml.xsl
 
 -Rob
 
 
 Andrea
 SchneiderTo: [EMAIL PROTECTED]
 [EMAIL PROTECTED]cc: (bcc: Robert 
 Weir/CAM/Lotus)
 Subject: [Fwd: Help!]
 Sent by:
 [EMAIL PROTECTED]
 
 
 03/21/00
 11:34 AM
 Please
 respond to
 xerces-dev
 
 
 

Hello Rob,
thank you for soon answer. I tried it and first it seems to be the right
way, but
when I increased the heap until 128MB I reached a border. I want ( in
this first
test ) to write 100 invoices. The border is reached on 68 invoices. 
Is there another possibility to decrease memory requirements ?

-Andrea


Re: [Xerces-C] Patches for OS/2 port

2000-03-22 Thread Bill Schindler
[EMAIL PROTECTED] wrote:
 Thanks Bill. We'll try to get those in next week.

So, uh... Are those patches going to make it in?



--Bill


Strange way to handle white spaces during parsing

2000-03-22 Thread Jean-Christophe Broudin
Hi,

We have noticed that, with an XML file such as:

A 
B
C Bono /C
/B
/A

if one asks how many childs A tags owns (using getLength() on
getDocumentElement().getChildNodes() ), the answer is
 3. After investigation, whitespaces between A and B, then between
/B and /A are interpreted as text nodes. 

On the same example, the XML parser included in IE 5.0 ignores these
whitespaces and the answer is 1. We think that 1 is the correct answer.

I really wonder:

Isn't it weird? 
Is it the normal behavior? 
Is there a means to disable this, and have an IE-5.0-like 
behaviour? 
Is it an actual compliance to XML and DOM specs?


Regards,

jean-christophe broudin.


Re: Status of COM wrapper (was xerces for Delphi)

2000-03-22 Thread Franz-Leo Chomse
On Wed, 22 Mar 2000 09:52:17 -0500, [EMAIL PROTECTED] wrote:

 

IBM recently donated a COM wrapper of Xerces to Apache.  It looks like the
source are checked into CVS, but no binary distribution has yet been
released.  Other than that, I don't know the status.

-Rob

Thank's

Franz-Leo



Re: Strange way to handle white spaces during parsing

2000-03-22 Thread Norman Walsh
/ Jean-Christophe Broudin [EMAIL PROTECTED] was heard to say:
|   We have noticed that, with an XML file such as:
| 
|   A 
|   B
|   C Bono /C
|   /B
|   /A
| 
|   if one asks how many childs A tags owns (using getLength() on
| getDocumentElement().getChildNodes() ), the answer is
|  3. After investigation, whitespaces between A and B, then between
| /B and /A are interpreted as text nodes. 
| 
|   On the same example, the XML parser included in IE 5.0 ignores these
| whitespaces and the answer is 1. We think that 1 is the correct answer.

The correct answer is three. 


Fwd: Strange way to handle white spaces during parsing

2000-03-22 Thread Jeffrey Rodriguez

I really wonder:
		Isn't it weird?
No, it is not. Read section 2.10 of the XML 1.0 Recommendation.

		Is it the normal behavior?
Yes, it is.
		Is there a means to disable this, and have an IE-5.0-
like behaviour?
Yes, set  http://apache.org/xml/features/dom/include-ignorable-whitespace;  
feature to false ( the default is true thus the
white space included ).

		Is it an actual compliance to XML and DOM specs?
Are you asking are we compliant to XML and DOM specs. Yes we are.
Again read the spec.

Regards,
	jean-christophe broudin.
Thanks,
  Jeffrey Rodriguez
  XML Development
  IBM Cupertino
__
Get Your Private, Free Email at http://www.hotmail.com


Re: Strange way to handle white spaces during parsing

2000-03-22 Thread Norman Walsh
/ Juergen Hermann [EMAIL PROTECTED] was heard to say:
|  Is there a means to disable this, and have an IE-5.0-like 
behaviour? 
| 
| Yes. Write and use a DTD, so the parser knows that A does not contain mixed 
| content. Note that this does not mean that you need to use the validating 
| parser.

The Xerces non-validating parser flags ignorable whitespace even when
it's not validating? How does it know? :-)

Be seeing you,
  norm

-- 
Norman Walsh [EMAIL PROTECTED]  | It is seldom that any liberty is
http://nwalsh.com/ | lost all at once.--David Hume



[Xerces-J] Serializable Documents

2000-03-22 Thread kito_mann
I noticed that DocumentImpl implements Serializable (as does NodeImpl), but
several other classes (like DocumentTypeImpl) don't. I'm trying to send a
DocumentImpl back form an EJB server, and obviously it's not working. I'm
assuming that the preferred method of doing this is just to use the
serializers.  I was hoping to avoid reconstructing the DOM tree. Any
thoughts?

Kito D. Mann
Virtua Communications Corp.




Bug in tutorial

2000-03-22 Thread Boris Garbuzov
Thanks to those who advised me a tutorial
http://metalab.unc.edu/xml/slides/sd2000west/xmlandjava/189.html.
Just the code lines
BigInteger low  = BigInteger.ZERO;
BigInteger high = BigInteger.ONE;
need to be changed into
BigInteger low  = new BigInteger (0);
BigInteger high = new BigInteger (1);




RE: SVG goes to DOM

2000-03-22 Thread Arnold, Curt
Steven Coffman wrote:
If SVG is going to be DOM based, rather than treated as a special case form
of XSL:FO, then that immediately says, Xerces to me. Should an
implementation of the W3C's SVG DOM be part of Xerces? Does that allow us to
do anything cool? Should it continue to be part of FOP? If so, how can we be
consistant with the Xerces stuff?

--

I don't think that there should be any SVG specific code in the 
Xerces DOM implementation.  That said, if you were able to create
a SVGDOMDocument that could override the constructors of NodeImpl
et al you might be able to allow anyone to create extended DOM's
without cutting and pasting the existing DOM code.

On kind of a related and not fully fleshed out thought, the xml-dev
list recently had a discussion on pros and cons of SVG's use of
long attributes for path data.  One of the key performance issues
that swayed the decision for the attribute form was that if the
path was described as elements, the DOM grew so large that memory use
and performance were unacceptible.

In effect, the switch from XML representation to a microparsed 
representation was a way to trick the DOM not to fully expand
the document to objects.  The SVGPathData interface is then an
attempt to gain some of the lost functionality.

Basically, that suggested to me that if you were able to hint that
at a certain place in the document, a flyweight implementation of Node
were used (child content was held as a single string, flyweight implementations
of Node, Attribute were mapped onto the string on use), you could
have the best of both worlds, keep the XML representation and DOM access
while avoiding the memory bloat of fully expanding the tree.





Re: [Xerces-C] Patches for OS/2 port

2000-03-22 Thread roddey



Did you mail me the OS/2 platform/compiler files? If you did, then my tiny
brain must have deleted that message by mistake since the last message I
see from you in my inbox is the big diff message you sent. Can you send me
the OS/2 files in a zip and I'll pass them along to Rahul to integrate.

Sorry if you already sent them, I promise I'll get it right this time.


Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]



Bill Schindler [EMAIL PROTECTED] on 03/22/2000 01:48:35 AM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:
Subject:  Re: [Xerces-C] Patches for OS/2 port



[EMAIL PROTECTED] wrote:
 Thanks Bill. We'll try to get those in next week.

So, uh... Are those patches going to make it in?



--Bill





Re: [Xerces-C] Patches for OS/2 port

2000-03-22 Thread roddey



Bill,

Can you just send your OS/2 files to Jeffrey? He has an OS/2 development
machine setup and can test out your changes and he has commit access. So
mail them to:

 [EMAIL PROTECTED]

If you've already mailed them to me, that's ok. I'll just vector them to
him. But if you see this message first, just send them on to Jeffrey.


Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]




Re: change of Node.getOwnerDocument()

2000-03-22 Thread Ralf I. Pfeiffer
Yes. This was changed to be conformant to the spec.

Boris Garbuzov wrote:

 I am playing with Xerces API as if I unit-tested it on a basic level:
 applying every available method in a way so that just it does not fail.
 And I noticed some change of Node.getOwnerDocument() after downloading
 1.0.3. Now it returned null for the node = document whereas it had
 returned a proper document in earlier version. Seems, this later one is
 according to specification that always existed? It used to be a bug?

--
person name=Ralf I. Pfeiffer
 loc=IBM JTC, Cupertino, CA
 email1=[EMAIL PROTECTED]
 email2=[EMAIL PROTECTED]
/




RE: Generate XML from DTD

2000-03-22 Thread Watson, Tom
Sangita Gupta wrote:
I wish to write a java code which can generate an xml document based on
a given dtd. Walk through the dtd, know which tags (name) to include in
the xml document, grab the value froma string and generate the document.
Is it possible? Any example will be greatly appreciated.

I have been researching this also and there would seem to be a lot of demand
since lots of corporate data is in flat files.  I have basically seen three
approaches:

1. Create a java application that reads in your DTD, reads in your data (a
line at a time)  and then creates the internal DOM structure using those two
sources.  Once you have the internal DOM built, you can serialize the XML as
an output stream.

Advantages - everything is done in one program
Disadvantages - fairly complex and dependent upon a specific implementation
of parser (IBM xml4j version 2 API) 

Take a look at XML and Java Building Web Applications by Maruyama, Tamura
and Uramoto...chapter 3.

2. (courtesy of Doug Tidwell)
a simpler approach would be a Java application that reads in your data and
simply creates XML tags associated with each column.  Somthing like this,

?xml version=1.0?
document
  row
column 1x/column1
column 2x/column2
.
  /row
  more rows
/document

then, you use xalan and stylesheets to render this XML format into a
format suitable for your particular application, where you do conversion of
columns to actual meaningful tags:

...
employees
  employee sex=F
serial number00/serial number
name
first_namejohn/first_name
last_namedoe/last_name
/name
.
  
the last step is to run it against the parser to validate against your DTD

Advantage - much simpler and parser independence
Disadvantage - more programs/steps/coordination required

3. buy from a vendor (I am researching this but have not found much).


Tom Watson


Re: Review: Serializer API

2000-03-22 Thread Andy Clark
How about changing Method not to MimeType but rather to ContentType?
A construct I use *very* often in my HTML files is the following:

  meta http-equiv='content-type' contents='text/html; charset=x-sjis'

Which would correspond to an equivalent response line from a web
server or a header line in an email message:

  Content-Type: text/html; charset=x-sjis

Then the ContentType could be the one used to specify the encoding
of the output stream. Which brings me back to my binary vs.
character serializer comment...

Clearly, in Java, OutputStream is for binary output and Writer is 
for character output. However, what's implied is that the program
has constructed the appropriate writer that converts the Unicode
characters to the appropriate byte sequences in that encoding. So
a Writer object is really writing to an OutputStream. 

But by changing character serializers to only support Writer, it
would appear that we're putting the onus on the programmer to
both specify the charset for the encoding type (so that any
reference to the charset in the output is correct -- e.g. the
XML encoding names, IANA, are *not* the same as the Java encoding
names) *and* create an output writer with the appropriate Java
encoding! However, I think that we can work through this by
providing a convenience mapping that creates the appropriate
writer from a given output stream. Here's a quick example:

  public class ContentType {

// Data
protected String type;
protected String charset;

// Constructors
public ContentType(String type, String charset) {
  this.type = type;
  this.charset = charset;
}

// Public methods
public Writer createWriter(OutputStream out) 
  throws UnsupportedEncodingException {

  String javaEncoding = /* do mapping on charset */;
  return new OutputStreamWriter(out, javaEncoding);
}

// ... etc ...

  }

What do you think?

Or maybe a static method would be better? Hmmm... I'm just
brainstorming here. I'd really like to hear other people's
opinions.

(I noticed that Sun's JavaMail extension has something
similar: javax.mail.internet.ContentType)

-- 
Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]


Re: Generate XML from DTD

2000-03-22 Thread Sangita Gupta
Thanks Tom. This gives me something to chew on. I will research it further and
keep you posted with the latest.
Sangita

Watson, Tom wrote:

 I have been researching this also and there would seem to be a lot of demand
 since lots of corporate data is in flat files.  I have basically seen three
 approaches:

 1. Create a java application that reads in your DTD, reads in your data (a
 line at a time)  and then creates the internal DOM structure using those two
 sources.  Once you have the internal DOM built, you can serialize the XML as
 an output stream.

 Advantages - everything is done in one program
 Disadvantages - fairly complex and dependent upon a specific implementation
 of parser (IBM xml4j version 2 API)

 Take a look at XML and Java Building Web Applications by Maruyama, Tamura
 and Uramoto...chapter 3.

 2. (courtesy of Doug Tidwell)
 a simpler approach would be a Java application that reads in your data and
 simply creates XML tags associated with each column.  Somthing like this,

 ?xml version=1.0?
 document
   row
 column 1x/column1
 column 2x/column2
 .
   /row
   more rows
 /document

 then, you use xalan and stylesheets to render this XML format into a
 format suitable for your particular application, where you do conversion of
 columns to actual meaningful tags:

 ...
 employees
   employee sex=F
 serial number00/serial number
 name
 first_namejohn/first_name
 last_namedoe/last_name
 /name
 .

 the last step is to run it against the parser to validate against your DTD

 Advantage - much simpler and parser independence
 Disadvantage - more programs/steps/coordination required

 3. buy from a vendor (I am researching this but have not found much).

 Tom Watson



Re: Strange way to handle white spaces during parsing

2000-03-22 Thread Norman Walsh
/ [EMAIL PROTECTED] was heard to say:
| If a DTD is present, its read and the information required to make this
| decision is present. It doesn't require validation, just a check to see
| what type of content model the element has.

I'm not comfortable with that answer at all. I think an option that
ignores element whitespace in a non-validating parse is non-standard
and potentially dangerous. Consider:

  The XML 1.0 REC, Section 2.10:

  An XML processor must always pass all characters in a document
  that are not markup through to the application. A validating
  XML processor must also inform the application which of these
  characters constitute white space appearing in element
  content.

I can't think of any way to interpret that such that a
non-validating parse could ignore whitespace.

Consider the following example:

!DOCTYPE test [
!ELEMENT a (b+)
!ELEMENT b (#PCDATA)
]
atestb/ b/this! 4 or 5?/a

Does a have four children or five? The answer has to be five.

And what about a document with an external subset that has
parameter entities that cannot be located, so that the DTD is
really half a loaf. Does it ignore whitespace in content models
that it found, but not in others?

Be seeing you,
  norm

-- 
Norman Walsh [EMAIL PROTECTED]  | As a general rule, the most
http://nwalsh.com/ | successful man in life is the man
   | who has the best
   | information.--Benjamin Disraeli



Re: Xerces bug: base URI and external parsed entities

2000-03-22 Thread Andy Clark
Norman Walsh wrote:
 Is it possible to access the URI of the document currently being
 parsed from XMLDTDScanner.scanEntityDecl()?

I think you can query the XMLEntityHandler for this information.
XMLDTDScanner has a reference to it.

-- 
Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]


Re: Strange way to handle white spaces during parsing

2000-03-22 Thread Andy Clark
Norman Walsh wrote:
 !DOCTYPE test [
 !ELEMENT a (b+)
 !ELEMENT b (#PCDATA)
 ]
 atestb/ b/this! 4 or 5?/a

!ELEMENT a (#PCDATA|b)*
!ELEMENT b EMPTY

-- 
Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]


Re: Strange way to handle white spaces during parsing

2000-03-22 Thread Norman Walsh
/ Andy Clark [EMAIL PROTECTED] was heard to say:
| Norman Walsh wrote:
|  !DOCTYPE test [
|  !ELEMENT a (b+)
|  !ELEMENT b (#PCDATA)
|  ]
|  atestb/ b/this! 4 or 5?/a
| 
| !ELEMENT a (#PCDATA|b)*
| !ELEMENT b EMPTY

Huh? I guess I wasn't clear. I explicitly constructed a document
that was well-formed but not valid. My question comments had to do
with ignorable whitespace in a non-validating parse.

The non-validating parser might think that a had element content
and imagine that it could throw whitespace away. But it would be
wrong.

Be seeing you,
  norm

-- 
Norman Walsh [EMAIL PROTECTED]  | The shortness of life can neither
http://nwalsh.com/ | dissuade us from its pleasures,
   | nor console us for its
   | pains.--Vauvenargues



Re: [Xerces-J] Serializable Documents

2000-03-22 Thread Andy Clark
[EMAIL PROTECTED] wrote:
 I noticed that DocumentImpl implements Serializable (as does NodeImpl), but
 several other classes (like DocumentTypeImpl) don't. I'm trying to send a
 DocumentImpl back form an EJB server, and obviously it's not working. I'm
 assuming that the preferred method of doing this is just to use the
 serializers.  I was hoping to avoid reconstructing the DOM tree. Any
 thoughts?

I don't understand this one because I am able to serialize a 
document (the standard Serializable way) and read it back in 
again w/o errors. Take a look at the sample program I've
attached to this message.

What is your EJB server doing differently about serializing
these objects?

Anyway, I think it's more efficient to write the document to
XML form, serialize *that*, and reparse it on the other end 
than to use Java serialization of Objects. The standard
Serializable format on the wire is huge.

-- 
Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]

Serializable.java
Description: application/unknown-content-type-java_auto_file


Re: Review: Serializer API

2000-03-22 Thread Arkin
Andy Clark wrote:
 
 First, I'd like to look at what's currently in the API and then
 discuss some points of design that I'd like to see in the
 serializers.
 
 DOMSerializer: I'm sort of surprised that there are methods to
 serialize a Document, Element, and DocumentFragment but nothing
 for a generic Node. In fact, if you wanted to serialize a text
 node or entity reference, you would first have to remove or
 clone it into a DocumentFragment and serialize that. And it is
 impossible to serialize things like attributes outside of their
 container elements. Would it be enough to have the following
 method?
 
   public void serialize(Node node) throws IOException;

I tried to stick to the W3C model which defines a document or a document
fragment, so if you want to print just an element, I think it makes
sense to use a document fragment.

As for serializing Node that happens to be an Attribute, keep in mind
that we're trying to define an API used by a lot of serializers. The
question that should be raised is: would it be trivial for them to
support it? Would a PDF serializer support that?



 But I think that we could do without it altogether and just
 make it possible to register new methods with the serializer
 factory. But I'll get to that in a minute.

If there is an agreement on that, I'll just make Method (which is
designed to hold the default output method names, nothing more) part of
the helpers class or kill it. I think it makes sense for documentation
the common methods, see comments below, it's not essential for anything
to work.


 And the type of the method could be the mime type which would
 avoid the need of a set/getMediaType on the OutputFormat object.
 And if this thing is really representing the mime type, perhaps
 it should be called such instead of Method. It would tie in
 better with existing standards.

XSLT defines an output method which has one of three names xml, html,
text or a qualified name for additional methods (like PDF, SVG, etc).It
then defines media-type as a separate value. I don't like it, but it's
part of the spec and the serializers have to support that for the sake
of XSLT processing.

To select a serializer you use the method name. Generally serializers do
not care about the media type, but if we have a Servlet getting an XSLT
response, it would probably want to use the media type as the content
type. This is why getOutputFormat() exists, to extract the output format
and determine the media type.

The default output formats (and more can be supported) are defined in
the helpers class, all of which provide values for both method and media
type. In addition, the factory allows one to get an output format
suitable for a given output method, so you can determine the media type.

Not the best design, I agree, but one which follows the XSLT specs.



 OutputFormat: It seems like a good idea to have a kind of
 properties object like OutputFormat. But it seems that the
 OutputFormat (and in fact the whole serializer API) is based
 on serializing to a text markup syntax. This sort of jumps
 the gun on what I'd like to say in general about the
 serialization API so I won't go any further at this point.
 Check out my comments below regarding this matter.

No, the serializer API does not assume markup, it was designed to
support PDF, JPEG, and other binary formats. An implementation should by
default support the three common text formats, but the API is designed
so other formats can be introduced as well.

Once again, if you read the XSLT spec it clearly defines xml, html and
text, does not define, but allows, other output methods. I followed the
same guidelines in coming up with this API.


 Serializer: I noticed that this design makes use of the SAX
 interfaces but not of the traversal APIs added with DOM Level
 2. Is there a way that we could leverage those interfaces?

Would make sense to support traversal for the DOMSerializer.

What would be the API requirements for that (other than
serializer(iterator))?


 SerializerFactory: There's no way to dynamically register
 OutputMethods or Serializers. I think that there should be
 a way to do this.

By definition the SerializerFactory is one way - but not the only way -
of obtaining serializers. You can also construct them directly. So no
need to go overboard with over generalizing it.

For registering serializers, I actually had a method for it, but I had
to pull it off and rethink it, since it would work better if it
registers both a serializer and a default OutputFormat.

I would definitely like to see a registration mechanism in the final
API.


 And overall, I'm not sure if we'd be allowed to drop stuff
 into the org.xml package namespace. Arkin: have you checked
 on this? And will any of this be superceded by DOM Level 3?
 at least on the DOM serialization side, that is... Perhaps
 Arnaud or someone else on the W3C commitee can shed light
 on this.

We are not yet dropping anything. There are two proposals, the
Serializer 

Re: Review: Serializer API

2000-03-22 Thread Arkin
I could not understand, were you just sending an endElement without a
stateElement?

(I can't check the line number right now, I have a newer copy on my
machine that hopefully fixes this bug.)

arkin

Boris Garbuzov wrote:
 
 String unexistingName = unexistingName;
 documentHandler.endElement (unexistingName);
 
 Even if I misuse the API (I should not call this directly?) it should have 
 failed friendlier than this:
 
 java.lang.NullPointerException:
  at
 org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:307)
 
  at
 org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:421)
 
  at
 com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:442)
 
 

-- 
--
Assaf Arkin   www.exoffice.com
CTO, Exoffice Technologies, Inc.www.exolab.org




Upcoming Re: Review: Serializer API

2000-03-22 Thread Arkin
Sorry for not being available lately, we have two major releases
scheduled for the O'Reilly conference. No time to breath.

The following is scheduled for release RSN:

The serializers have been revised to include some bug fixes, performance
improvement and preliminary support for encodings. They have also been
brought up to speed with the proposed API.

A WMLSerializer will be introduced along with a WML DOM (contributed by
David Li).

Minor bug fixes to the HTML DOM, and a version of the HTML parser for
testing purposes.

arkin

-- 
--
Assaf Arkin   www.exoffice.com
CTO, Exoffice Technologies, Inc.www.exolab.org




PATCH: Re: Xerces bug: base URI and external parsed entities

2000-03-22 Thread Norman Walsh
The following patch seems to fix the relative URI bug. If (one of)
the Xerces maintainers deems it worthy, please check it in :-)

Index: XMLDTDScanner.java
===
RCS file: /home/cvspublic/xml-xerces/java/src/org/apache/xerces/framework/XMLDTD
Scanner.java,v
retrieving revision 1.4
diff -r1.4 XMLDTDScanner.java
1200a1201,1219

   // [EMAIL PROTECTED]
   //
   // An fSystemLiteral value from an entity declaration may be
   // a relative URI. If so, it's important that we make it
   // absolute with respect to the context of the document that
   // we are currently reading. If we don't, the XMLParser will
   // make it absolute with respect to the point of *reference*,
   // before attempting to read it. That's definitely wrong.
   //
   String litSystemId = fStringPool.toString(fSystemLiteral);
   String absSystemId = fEntityHandler.expandSystemId(litSystemId);
   if (!absSystemId.equals(litSystemId)) {
   // REVISIT - Is it kosher to touch fStringPool directly?
   // Is there a better way? fEntityReader doesn't seem to
   // have an addString method that takes a literal string.
   fSystemLiteral = fStringPool.addString(absSystemId);
   }

2376a2396


Be seeing you,
  norm

-- 
Norman Walsh [EMAIL PROTECTED]  | Nothing ever gets anywhere. The
http://nwalsh.com/ | earth keeps turning round and gets
   | nowhere. The moment is the only
   | thing that counts.--Jean Cocteau



Re: [Xerces-J] Serializable Documents

2000-03-22 Thread Arkin
 Anyway, I think it's more efficient to write the document to
 XML form, serialize *that*, and reparse it on the other end
 than to use Java serialization of Objects. The standard
 Serializable format on the wire is huge.

And not as efficient as people would think due to the use of reflection.

arkin

 
 --
 Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]
 
   
 Name: Serializable.java
Serializable.javaType: 
 application/x-unknown-content-type-java_auto_file
 Encoding: base64

-- 
--
Assaf Arkin   www.exoffice.com
CTO, Exoffice Technologies, Inc.www.exolab.org


Re: Review: Serializer API

2000-03-22 Thread Arkin
That's the explanation.

arkin


Jeffrey Rodriguez wrote:
 
 Mr. Garbuzov, you just found a bug.
 
 In XMLSerializer
 
 public void endElement( String namespaceURI, String localName,
 String rawName )
 {
 ElementState state;
 
 // Works much like content() with additions for closing
 // an element. Note the different checks for the closed
 // element's state and the parent element's state.
 unindent();
 state = getElementState();
 if ( state.empty ) {
 
 In this method getElementState() call may return a null
 so  trying to access state.empty would  cause a nullpointer exception
 
 To fix this bug we need to check if state is null first.
 
 Thanks,
 Jeffrey Rodriguez
 XML4J Support
 IBM Cupertino
 
 From: Boris Garbuzov [EMAIL PROTECTED]
 Reply-To: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: Review: Serializer API
 Date: Tue, 21 Mar 2000 16:57:05 -0800
 
  String unexistingName = unexistingName;
  documentHandler.endElement (unexistingName);
 
 Even if I misuse the API (I should not call this directly?) it should have
 failed friendlier than this:
 
 java.lang.NullPointerException:
   at
 org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:307)
   at
 org.apache.xml.serialize.XMLSerializer.endElement(XMLSerializer.java:421)
   at
 com.keystrokenet.loanproduct.xml.test.Lab.executeTestBody(Lab.java:442)
 
 
 
 __
 Get Your Private, Free Email at http://www.hotmail.com

-- 
--
Assaf Arkin   www.exoffice.com
CTO, Exoffice Technologies, Inc.www.exolab.org


Re: Strange way to handle white spaces during parsing

2000-03-22 Thread Andy Clark
Norman Walsh wrote:
 Huh? I guess I wasn't clear. I explicitly constructed a document
 that was well-formed but not valid. My question comments had to do
 with ignorable whitespace in a non-validating parse.

Gotcha. In non-validating case (with or without a DTD), all
character content is significant. (I'd have to verify the
with DTD case, though.)

-- 
Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]


Re: Strange way to handle white spaces during parsing

2000-03-22 Thread roddey



Sorry, my mind wasn't in gear. I was pointing out how it could work, but
should have pointed out that we of course do the right thing (at least the
C++ parser) and always call characters() if we are not validating.


Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]



Norman Walsh [EMAIL PROTECTED] on 03/22/2000 11:26:02 AM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:
Subject:  Re: Strange way to handle white spaces during parsing



/ [EMAIL PROTECTED] was heard to say:
| If a DTD is present, its read and the information required to make this
| decision is present. It doesn't require validation, just a check to see
| what type of content model the element has.

I'm not comfortable with that answer at all. I think an option that
ignores element whitespace in a non-validating parse is non-standard
and potentially dangerous. Consider:

  The XML 1.0 REC, Section 2.10:

  An XML processor must always pass all characters in a document
  that are not markup through to the application. A validating
  XML processor must also inform the application which of these
  characters constitute white space appearing in element
  content.

I can't think of any way to interpret that such that a
non-validating parse could ignore whitespace.

Consider the following example:

!DOCTYPE test [
!ELEMENT a (b+)
!ELEMENT b (#PCDATA)
]
atestb/ b/this! 4 or 5?/a

Does a have four children or five? The answer has to be five.

And what about a document with an external subset that has
parameter entities that cannot be located, so that the DTD is
really half a loaf. Does it ignore whitespace in content models
that it found, but not in others?

Be seeing you,
  norm

--
Norman Walsh [EMAIL PROTECTED]  | As a general rule, the most
http://nwalsh.com/ | successful man in life is the man
   | who has the best
   | information.--Benjamin Disraeli






Configurable missing?

2000-03-22 Thread brk
Hi all,
It seems that org.xml.sax.Configurable was removed from xerces.jar
starting with 1.0.2. Was this intentional, and if so, where should I find
it?

Bryn Keller
Senior Software Engineer
Jenkon International
[EMAIL PROTECTED]



RE: SVG goes to DOM

2000-03-22 Thread COFFMAN Steven
This was a FOP message, but you're the DOM experts, so I'd like to get your
input. 

The end result we want is that Scalable Vector Graphics (SVG) be translated
to PDF. Kieron's been treating SVG sort of as a special case of XSL:FO,
which is why it's been [uncommitted, but still] in FOP.

If SVG is going to be DOM based, rather than treated as a special case form
of XSL:FO, then that immediately says, Xerces to me. Should an
implementation of the W3C's SVG DOM be part of Xerces? Does that allow us to
do anything cool? Should it continue to be part of FOP? If so, how can we be
consistant with the Xerces stuff?
-Steve
-Original Message-
From: Keiron Liddle [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2000 7:08 AM
To: fop-dev@xml.apache.org
Subject: SVG



I've had a look at the SVG dom classes.

I will be moving all the svg code into this model once I figure out a few
things. This means some major restructuring.

This raises some questions.

should the implementation be placed in
org.apache.svg.dom.*

should all implementation classes be called
interface nameImpl.java

Is it possible to have property makers (ie.
propertyTable.put(width,SVGLengthProperty.maker()))
that applies only to a particular xml element or maybe xml namespace.
If not then there will be some problems with properties in svg and fop that
have the same name but need to return different objects (without making
Property bloated).
Also in svg the text element can have a list of x values that should parse
the
property into a list, other x values should only be a single number.

COFFMAN Steven wrote:

 In PDF, SVG, XSL, etc. we're flinging RGB floating point color components
 around. I'd like to fling a color object around instead.

The SVG color object will be the implementation of the
org.w3c.dom.svg.SVGColor which holds an RGBColor



static library for unix build?

2000-03-22 Thread Dean Hoover
Are you going to add creation of a static library for
Xerces C in some future release? I would rather not
use the shared library.

Thanks in advance.

Dean Hoover



Re: static library for unix build?

2000-03-22 Thread roddey



No, we do not support a static configuration, nor do we really have any
plans to. You are free to do it yourself, but we will not have any
officially supported static configuration.


Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]



Dean Hoover [EMAIL PROTECTED] on 03/22/2000 03:07:59 PM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:
Subject:  static library for unix build?



Are you going to add creation of a static library for
Xerces C in some future release? I would rather not
use the shared library.

Thanks in advance.

Dean Hoover




Re: How to finish/close a document

2000-03-22 Thread Ralf I. Pfeiffer
DTD/Schema access, caching, and re-validation is being investigated. These
issues
are also on the table for DOM Level 3 discussion.

Currently, your method of writing XML and re-parsing  - however convoluted  - is
the
easiest.

Regards,
-Ralf





DIfference between Xerces 1.1 and IBM XML4C

2000-03-22 Thread PHAN Tri
Hello,

Being fairly new to XML, I was wondering if anyone cam answer the following 
questions.

1. What is the difference between Xerces 1.1.0 and IBM XML4C 3.1 ?
2. Would you recommend the use of Xerces in mission critical, real-time 
applications ?
3. What type of support will we have with Xerces 1.1.0 ? Are there commercial 
support for Xerces ?

Thanks,

Tri Phan
SWIFT