Re: Changes necessary in StAX to get it on par with DecentXML

2008-08-08 Thread Jochen Wiedmann
Hi, Aaron,

On Wed, Aug 6, 2008 at 9:48 AM, Aaron Digulla [EMAIL PROTECTED] wrote:

 That didn't work well. Okay, since you don't believe me, here is an
 (incomplete) list of changes I would need in StAX to be able to use it for
 my work instead of having to write my own XML parser.

I think here is a misunderstandment. The question was not (at least
not IMO) whether you could rewrite DecentXML to use StAX, but whether
you could give DecentXML a StAX API. That way Maven components could
basically trust in a standard API, except for the cases where the
differences matter.

Jochen


-- 
Look, that's why there's rules, understand? So that you think before
you break 'em.

 -- (Terry Pratchett, Thief of Time)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Changes necessary in StAX to get it on par with DecentXML

2008-08-08 Thread Aaron Digulla

Quoting Jochen Wiedmann [EMAIL PROTECTED]:


That didn't work well. Okay, since you don't believe me, here is an
(incomplete) list of changes I would need in StAX to be able to use it for
my work instead of having to write my own XML parser.


I think here is a misunderstandment. The question was not (at least
not IMO) whether you could rewrite DecentXML to use StAX, but whether
you could give DecentXML a StAX API. That way Maven components could
basically trust in a standard API, except for the cases where the
differences matter.


Internally, DecentXML uses a StAX-like API (the DOM parser part uses a  
tokenizer to break the input into pieces). The problem is that I can't  
base DecentXML on StAX because StAX throws information away that I  
need and I can't offer a StAX-compatible API because StAX isn't meant  
to keep the information I need to preserve.


I could create a filter class which strips the information that  
DecentXML provides down to a StAX API but that would still be useless:


1. Case: You read POM files to create an object model. Here, you would  
gain nothing except you'd have to look for new bugs.


2. Case: You read POM files to filter them. Here the StAX API is  
useless because it throws too much information away, so this code  
would have to be written from scratch using an incompatible API  
anyway, no matter how you solve it.


That said, I've written a Maven search'r'replace tool using DecentXML.  
It's standalone (no dependencies besides DecentXML and Java's rt.jar),  
it can search pom.xml files with certain elements/texts and on these  
files, it can print certain parts (search), check that certain parts  
have certain values (for example, that all parent elements have the  
right version in them) and it can replace existing values with new ones.


The source is roughly 400 lines:  
http://code.google.com/p/decentxml/source/browse/trunk/src/test/java/de/pdark/decentxml/MavenSNR.java?r=34


What I would like to see is that Maven keeps the StAX API internally  
to build the POM and to do its job. Only when a plugin needs to  
*manipulate* XML that a *user* has written *by hand*, it should use  
DecentXML. Currently, I know only of these plugins which fall into  
that category:


- archetype
- war
- version

All other plugins wouldn't gain enough from a transition to make the  
effort worthwhile.


Regards,

--
Aaron Optimizer Digulla a.k.a. Philmann Dark
It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits.
http://www.pdark.de/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Changes necessary in StAX to get it on par with DecentXML

2008-08-08 Thread Stephen Connolly
On Fri, Aug 8, 2008 at 2:15 PM, Aaron Digulla [EMAIL PROTECTED] wrote:

 Quoting Jochen Wiedmann [EMAIL PROTECTED]:

  That didn't work well. Okay, since you don't believe me, here is an
 (incomplete) list of changes I would need in StAX to be able to use it
 for
 my work instead of having to write my own XML parser.


 I think here is a misunderstandment. The question was not (at least
 not IMO) whether you could rewrite DecentXML to use StAX, but whether
 you could give DecentXML a StAX API. That way Maven components could
 basically trust in a standard API, except for the cases where the
 differences matter.


 Internally, DecentXML uses a StAX-like API (the DOM parser part uses a
 tokenizer to break the input into pieces). The problem is that I can't base
 DecentXML on StAX because StAX throws information away that I need and I
 can't offer a StAX-compatible API because StAX isn't meant to keep the
 information I need to preserve.

 I could create a filter class which strips the information that DecentXML
 provides down to a StAX API but that would still be useless:

 1. Case: You read POM files to create an object model. Here, you would gain
 nothing except you'd have to look for new bugs.

 2. Case: You read POM files to filter them. Here the StAX API is useless
 because it throws too much information away, so this code would have to be
 written from scratch using an incompatible API anyway, no matter how you
 solve it.

 That said, I've written a Maven search'r'replace tool using DecentXML. It's
 standalone (no dependencies besides DecentXML and Java's rt.jar), it can
 search pom.xml files with certain elements/texts and on these files, it can
 print certain parts (search), check that certain parts have certain values
 (for example, that all parent elements have the right version in them) and
 it can replace existing values with new ones.

 The source is roughly 400 lines:
 http://code.google.com/p/decentxml/source/browse/trunk/src/test/java/de/pdark/decentxml/MavenSNR.java?r=34

 What I would like to see is that Maven keeps the StAX API internally to
 build the POM and to do its job. Only when a plugin needs to *manipulate*
 XML that a *user* has written *by hand*, it should use DecentXML. Currently,
 I know only of these plugins which fall into that category:

 - archetype
 - war
 - version

 All other plugins wouldn't gain enough from a transition to make the effort
 worthwhile.


I actually think that this can be done... but only using the
EventReader/EventWriter model.

All the XMLEvents returned by the XMLEventReader are just interfaces.

You store in a concrete class for each type of event the information you
need.  In fact I'm writing an implementation just now.

The XMLEventWriter then takes the events back... if they are the events that
originated from my XMLEventReader, then I just use the exact padding and
prefixing, etc that was read in... if they are new events that originated
from elsewhere, I examine how previous events were formatted and try to
replicate that.

I agree that the cursor interface of StAX will not do what we need... but
the Event interface can, as I suspected... but needed to confirm, do exactly
what we need... and I am in the process of writing an implementation.

The only issue that I could see blocking me is how the StAX specification
defines the handling of CR, LF and CR/LF pairs. I suspect that I can stash
this information when reading provided the XMLEventReader is constructed
from the _correct_ stream.

-Stephen.





 Regards,

 --
 Aaron Optimizer Digulla a.k.a. Philmann Dark
 It's not the universe that's limited, it's our imagination.
 Follow me and I'll show you something beyond the limits.
 http://www.pdark.de/

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: Changes necessary in StAX to get it on par with DecentXML

2008-08-08 Thread Stephen Connolly
OK... one change needed in the StAX API is to fix
XMLOutputFactory.newInstance(String, ClassLoader) to return XMLOutputFactory
and not XMLInputFactory.

;-)

On Fri, Aug 8, 2008 at 7:55 PM, Stephen Connolly 
[EMAIL PROTECTED] wrote:

 On Fri, Aug 8, 2008 at 2:15 PM, Aaron Digulla [EMAIL PROTECTED] wrote:

 Quoting Jochen Wiedmann [EMAIL PROTECTED]:

  That didn't work well. Okay, since you don't believe me, here is an
 (incomplete) list of changes I would need in StAX to be able to use it
 for
 my work instead of having to write my own XML parser.


 I think here is a misunderstandment. The question was not (at least
 not IMO) whether you could rewrite DecentXML to use StAX, but whether
 you could give DecentXML a StAX API. That way Maven components could
 basically trust in a standard API, except for the cases where the
 differences matter.


 Internally, DecentXML uses a StAX-like API (the DOM parser part uses a
 tokenizer to break the input into pieces). The problem is that I can't base
 DecentXML on StAX because StAX throws information away that I need and I
 can't offer a StAX-compatible API because StAX isn't meant to keep the
 information I need to preserve.

 I could create a filter class which strips the information that DecentXML
 provides down to a StAX API but that would still be useless:

 1. Case: You read POM files to create an object model. Here, you would
 gain nothing except you'd have to look for new bugs.

 2. Case: You read POM files to filter them. Here the StAX API is useless
 because it throws too much information away, so this code would have to be
 written from scratch using an incompatible API anyway, no matter how you
 solve it.

 That said, I've written a Maven search'r'replace tool using DecentXML.
 It's standalone (no dependencies besides DecentXML and Java's rt.jar), it
 can search pom.xml files with certain elements/texts and on these files, it
 can print certain parts (search), check that certain parts have certain
 values (for example, that all parent elements have the right version in
 them) and it can replace existing values with new ones.

 The source is roughly 400 lines:
 http://code.google.com/p/decentxml/source/browse/trunk/src/test/java/de/pdark/decentxml/MavenSNR.java?r=34

 What I would like to see is that Maven keeps the StAX API internally to
 build the POM and to do its job. Only when a plugin needs to *manipulate*
 XML that a *user* has written *by hand*, it should use DecentXML. Currently,
 I know only of these plugins which fall into that category:

 - archetype
 - war
 - version

 All other plugins wouldn't gain enough from a transition to make the
 effort worthwhile.


 I actually think that this can be done... but only using the
 EventReader/EventWriter model.

 All the XMLEvents returned by the XMLEventReader are just interfaces.

 You store in a concrete class for each type of event the information you
 need.  In fact I'm writing an implementation just now.

 The XMLEventWriter then takes the events back... if they are the events
 that originated from my XMLEventReader, then I just use the exact padding
 and prefixing, etc that was read in... if they are new events that
 originated from elsewhere, I examine how previous events were formatted and
 try to replicate that.

 I agree that the cursor interface of StAX will not do what we need... but
 the Event interface can, as I suspected... but needed to confirm, do exactly
 what we need... and I am in the process of writing an implementation.

 The only issue that I could see blocking me is how the StAX specification
 defines the handling of CR, LF and CR/LF pairs. I suspect that I can stash
 this information when reading provided the XMLEventReader is constructed
 from the _correct_ stream.

 -Stephen.





 Regards,

 --
 Aaron Optimizer Digulla a.k.a. Philmann Dark
 It's not the universe that's limited, it's our imagination.
 Follow me and I'll show you something beyond the limits.
 http://www.pdark.de/

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





Re: Changes necessary in StAX to get it on par with DecentXML

2008-08-08 Thread Stephen Connolly
For round-triping, I am 99.99% certain that no other changes are required to
StAX (providing you are fine with the Event half of the API)

For generating XML to match exacting formatting, a special mode whereby
ignorableSpace events can be raised and received in between Attribute
events.

There could be a need some other event required to signal the end of the
start tag.

-Stephen

On Sat, Aug 9, 2008 at 12:58 AM, Stephen Connolly 
[EMAIL PROTECTED] wrote:

 OK... one change needed in the StAX API is to fix
 XMLOutputFactory.newInstance(String, ClassLoader) to return XMLOutputFactory
 and not XMLInputFactory.

 ;-)


 On Fri, Aug 8, 2008 at 7:55 PM, Stephen Connolly 
 [EMAIL PROTECTED] wrote:

 On Fri, Aug 8, 2008 at 2:15 PM, Aaron Digulla [EMAIL PROTECTED] wrote:

 Quoting Jochen Wiedmann [EMAIL PROTECTED]:

  That didn't work well. Okay, since you don't believe me, here is an
 (incomplete) list of changes I would need in StAX to be able to use it
 for
 my work instead of having to write my own XML parser.


 I think here is a misunderstandment. The question was not (at least
 not IMO) whether you could rewrite DecentXML to use StAX, but whether
 you could give DecentXML a StAX API. That way Maven components could
 basically trust in a standard API, except for the cases where the
 differences matter.


 Internally, DecentXML uses a StAX-like API (the DOM parser part uses a
 tokenizer to break the input into pieces). The problem is that I can't base
 DecentXML on StAX because StAX throws information away that I need and I
 can't offer a StAX-compatible API because StAX isn't meant to keep the
 information I need to preserve.

 I could create a filter class which strips the information that DecentXML
 provides down to a StAX API but that would still be useless:

 1. Case: You read POM files to create an object model. Here, you would
 gain nothing except you'd have to look for new bugs.

 2. Case: You read POM files to filter them. Here the StAX API is useless
 because it throws too much information away, so this code would have to be
 written from scratch using an incompatible API anyway, no matter how you
 solve it.

 That said, I've written a Maven search'r'replace tool using DecentXML.
 It's standalone (no dependencies besides DecentXML and Java's rt.jar), it
 can search pom.xml files with certain elements/texts and on these files, it
 can print certain parts (search), check that certain parts have certain
 values (for example, that all parent elements have the right version in
 them) and it can replace existing values with new ones.

 The source is roughly 400 lines:
 http://code.google.com/p/decentxml/source/browse/trunk/src/test/java/de/pdark/decentxml/MavenSNR.java?r=34

 What I would like to see is that Maven keeps the StAX API internally to
 build the POM and to do its job. Only when a plugin needs to *manipulate*
 XML that a *user* has written *by hand*, it should use DecentXML. Currently,
 I know only of these plugins which fall into that category:

 - archetype
 - war
 - version

 All other plugins wouldn't gain enough from a transition to make the
 effort worthwhile.


 I actually think that this can be done... but only using the
 EventReader/EventWriter model.

 All the XMLEvents returned by the XMLEventReader are just interfaces.

 You store in a concrete class for each type of event the information you
 need.  In fact I'm writing an implementation just now.

 The XMLEventWriter then takes the events back... if they are the events
 that originated from my XMLEventReader, then I just use the exact padding
 and prefixing, etc that was read in... if they are new events that
 originated from elsewhere, I examine how previous events were formatted and
 try to replicate that.

 I agree that the cursor interface of StAX will not do what we need... but
 the Event interface can, as I suspected... but needed to confirm, do exactly
 what we need... and I am in the process of writing an implementation.

 The only issue that I could see blocking me is how the StAX specification
 defines the handling of CR, LF and CR/LF pairs. I suspect that I can stash
 this information when reading provided the XMLEventReader is constructed
 from the _correct_ stream.

 -Stephen.





 Regards,

 --
 Aaron Optimizer Digulla a.k.a. Philmann Dark
 It's not the universe that's limited, it's our imagination.
 Follow me and I'll show you something beyond the limits.
 http://www.pdark.de/

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]