Re: Changes necessary in StAX to get it on par with DecentXML
Hi, Aaron, On Wed, Aug 6, 2008 at 9:48 AM, Aaron Digulla [EMAIL PROTECTED] wrote: That didn't work well. Okay, since you don't believe me, here is an (incomplete) list of changes I would need in StAX to be able to use it for my work instead of having to write my own XML parser. I think here is a misunderstandment. The question was not (at least not IMO) whether you could rewrite DecentXML to use StAX, but whether you could give DecentXML a StAX API. That way Maven components could basically trust in a standard API, except for the cases where the differences matter. Jochen -- Look, that's why there's rules, understand? So that you think before you break 'em. -- (Terry Pratchett, Thief of Time) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Changes necessary in StAX to get it on par with DecentXML
Quoting Jochen Wiedmann [EMAIL PROTECTED]: That didn't work well. Okay, since you don't believe me, here is an (incomplete) list of changes I would need in StAX to be able to use it for my work instead of having to write my own XML parser. I think here is a misunderstandment. The question was not (at least not IMO) whether you could rewrite DecentXML to use StAX, but whether you could give DecentXML a StAX API. That way Maven components could basically trust in a standard API, except for the cases where the differences matter. Internally, DecentXML uses a StAX-like API (the DOM parser part uses a tokenizer to break the input into pieces). The problem is that I can't base DecentXML on StAX because StAX throws information away that I need and I can't offer a StAX-compatible API because StAX isn't meant to keep the information I need to preserve. I could create a filter class which strips the information that DecentXML provides down to a StAX API but that would still be useless: 1. Case: You read POM files to create an object model. Here, you would gain nothing except you'd have to look for new bugs. 2. Case: You read POM files to filter them. Here the StAX API is useless because it throws too much information away, so this code would have to be written from scratch using an incompatible API anyway, no matter how you solve it. That said, I've written a Maven search'r'replace tool using DecentXML. It's standalone (no dependencies besides DecentXML and Java's rt.jar), it can search pom.xml files with certain elements/texts and on these files, it can print certain parts (search), check that certain parts have certain values (for example, that all parent elements have the right version in them) and it can replace existing values with new ones. The source is roughly 400 lines: http://code.google.com/p/decentxml/source/browse/trunk/src/test/java/de/pdark/decentxml/MavenSNR.java?r=34 What I would like to see is that Maven keeps the StAX API internally to build the POM and to do its job. Only when a plugin needs to *manipulate* XML that a *user* has written *by hand*, it should use DecentXML. Currently, I know only of these plugins which fall into that category: - archetype - war - version All other plugins wouldn't gain enough from a transition to make the effort worthwhile. Regards, -- Aaron Optimizer Digulla a.k.a. Philmann Dark It's not the universe that's limited, it's our imagination. Follow me and I'll show you something beyond the limits. http://www.pdark.de/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Changes necessary in StAX to get it on par with DecentXML
On Fri, Aug 8, 2008 at 2:15 PM, Aaron Digulla [EMAIL PROTECTED] wrote: Quoting Jochen Wiedmann [EMAIL PROTECTED]: That didn't work well. Okay, since you don't believe me, here is an (incomplete) list of changes I would need in StAX to be able to use it for my work instead of having to write my own XML parser. I think here is a misunderstandment. The question was not (at least not IMO) whether you could rewrite DecentXML to use StAX, but whether you could give DecentXML a StAX API. That way Maven components could basically trust in a standard API, except for the cases where the differences matter. Internally, DecentXML uses a StAX-like API (the DOM parser part uses a tokenizer to break the input into pieces). The problem is that I can't base DecentXML on StAX because StAX throws information away that I need and I can't offer a StAX-compatible API because StAX isn't meant to keep the information I need to preserve. I could create a filter class which strips the information that DecentXML provides down to a StAX API but that would still be useless: 1. Case: You read POM files to create an object model. Here, you would gain nothing except you'd have to look for new bugs. 2. Case: You read POM files to filter them. Here the StAX API is useless because it throws too much information away, so this code would have to be written from scratch using an incompatible API anyway, no matter how you solve it. That said, I've written a Maven search'r'replace tool using DecentXML. It's standalone (no dependencies besides DecentXML and Java's rt.jar), it can search pom.xml files with certain elements/texts and on these files, it can print certain parts (search), check that certain parts have certain values (for example, that all parent elements have the right version in them) and it can replace existing values with new ones. The source is roughly 400 lines: http://code.google.com/p/decentxml/source/browse/trunk/src/test/java/de/pdark/decentxml/MavenSNR.java?r=34 What I would like to see is that Maven keeps the StAX API internally to build the POM and to do its job. Only when a plugin needs to *manipulate* XML that a *user* has written *by hand*, it should use DecentXML. Currently, I know only of these plugins which fall into that category: - archetype - war - version All other plugins wouldn't gain enough from a transition to make the effort worthwhile. I actually think that this can be done... but only using the EventReader/EventWriter model. All the XMLEvents returned by the XMLEventReader are just interfaces. You store in a concrete class for each type of event the information you need. In fact I'm writing an implementation just now. The XMLEventWriter then takes the events back... if they are the events that originated from my XMLEventReader, then I just use the exact padding and prefixing, etc that was read in... if they are new events that originated from elsewhere, I examine how previous events were formatted and try to replicate that. I agree that the cursor interface of StAX will not do what we need... but the Event interface can, as I suspected... but needed to confirm, do exactly what we need... and I am in the process of writing an implementation. The only issue that I could see blocking me is how the StAX specification defines the handling of CR, LF and CR/LF pairs. I suspect that I can stash this information when reading provided the XMLEventReader is constructed from the _correct_ stream. -Stephen. Regards, -- Aaron Optimizer Digulla a.k.a. Philmann Dark It's not the universe that's limited, it's our imagination. Follow me and I'll show you something beyond the limits. http://www.pdark.de/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Changes necessary in StAX to get it on par with DecentXML
OK... one change needed in the StAX API is to fix XMLOutputFactory.newInstance(String, ClassLoader) to return XMLOutputFactory and not XMLInputFactory. ;-) On Fri, Aug 8, 2008 at 7:55 PM, Stephen Connolly [EMAIL PROTECTED] wrote: On Fri, Aug 8, 2008 at 2:15 PM, Aaron Digulla [EMAIL PROTECTED] wrote: Quoting Jochen Wiedmann [EMAIL PROTECTED]: That didn't work well. Okay, since you don't believe me, here is an (incomplete) list of changes I would need in StAX to be able to use it for my work instead of having to write my own XML parser. I think here is a misunderstandment. The question was not (at least not IMO) whether you could rewrite DecentXML to use StAX, but whether you could give DecentXML a StAX API. That way Maven components could basically trust in a standard API, except for the cases where the differences matter. Internally, DecentXML uses a StAX-like API (the DOM parser part uses a tokenizer to break the input into pieces). The problem is that I can't base DecentXML on StAX because StAX throws information away that I need and I can't offer a StAX-compatible API because StAX isn't meant to keep the information I need to preserve. I could create a filter class which strips the information that DecentXML provides down to a StAX API but that would still be useless: 1. Case: You read POM files to create an object model. Here, you would gain nothing except you'd have to look for new bugs. 2. Case: You read POM files to filter them. Here the StAX API is useless because it throws too much information away, so this code would have to be written from scratch using an incompatible API anyway, no matter how you solve it. That said, I've written a Maven search'r'replace tool using DecentXML. It's standalone (no dependencies besides DecentXML and Java's rt.jar), it can search pom.xml files with certain elements/texts and on these files, it can print certain parts (search), check that certain parts have certain values (for example, that all parent elements have the right version in them) and it can replace existing values with new ones. The source is roughly 400 lines: http://code.google.com/p/decentxml/source/browse/trunk/src/test/java/de/pdark/decentxml/MavenSNR.java?r=34 What I would like to see is that Maven keeps the StAX API internally to build the POM and to do its job. Only when a plugin needs to *manipulate* XML that a *user* has written *by hand*, it should use DecentXML. Currently, I know only of these plugins which fall into that category: - archetype - war - version All other plugins wouldn't gain enough from a transition to make the effort worthwhile. I actually think that this can be done... but only using the EventReader/EventWriter model. All the XMLEvents returned by the XMLEventReader are just interfaces. You store in a concrete class for each type of event the information you need. In fact I'm writing an implementation just now. The XMLEventWriter then takes the events back... if they are the events that originated from my XMLEventReader, then I just use the exact padding and prefixing, etc that was read in... if they are new events that originated from elsewhere, I examine how previous events were formatted and try to replicate that. I agree that the cursor interface of StAX will not do what we need... but the Event interface can, as I suspected... but needed to confirm, do exactly what we need... and I am in the process of writing an implementation. The only issue that I could see blocking me is how the StAX specification defines the handling of CR, LF and CR/LF pairs. I suspect that I can stash this information when reading provided the XMLEventReader is constructed from the _correct_ stream. -Stephen. Regards, -- Aaron Optimizer Digulla a.k.a. Philmann Dark It's not the universe that's limited, it's our imagination. Follow me and I'll show you something beyond the limits. http://www.pdark.de/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Changes necessary in StAX to get it on par with DecentXML
For round-triping, I am 99.99% certain that no other changes are required to StAX (providing you are fine with the Event half of the API) For generating XML to match exacting formatting, a special mode whereby ignorableSpace events can be raised and received in between Attribute events. There could be a need some other event required to signal the end of the start tag. -Stephen On Sat, Aug 9, 2008 at 12:58 AM, Stephen Connolly [EMAIL PROTECTED] wrote: OK... one change needed in the StAX API is to fix XMLOutputFactory.newInstance(String, ClassLoader) to return XMLOutputFactory and not XMLInputFactory. ;-) On Fri, Aug 8, 2008 at 7:55 PM, Stephen Connolly [EMAIL PROTECTED] wrote: On Fri, Aug 8, 2008 at 2:15 PM, Aaron Digulla [EMAIL PROTECTED] wrote: Quoting Jochen Wiedmann [EMAIL PROTECTED]: That didn't work well. Okay, since you don't believe me, here is an (incomplete) list of changes I would need in StAX to be able to use it for my work instead of having to write my own XML parser. I think here is a misunderstandment. The question was not (at least not IMO) whether you could rewrite DecentXML to use StAX, but whether you could give DecentXML a StAX API. That way Maven components could basically trust in a standard API, except for the cases where the differences matter. Internally, DecentXML uses a StAX-like API (the DOM parser part uses a tokenizer to break the input into pieces). The problem is that I can't base DecentXML on StAX because StAX throws information away that I need and I can't offer a StAX-compatible API because StAX isn't meant to keep the information I need to preserve. I could create a filter class which strips the information that DecentXML provides down to a StAX API but that would still be useless: 1. Case: You read POM files to create an object model. Here, you would gain nothing except you'd have to look for new bugs. 2. Case: You read POM files to filter them. Here the StAX API is useless because it throws too much information away, so this code would have to be written from scratch using an incompatible API anyway, no matter how you solve it. That said, I've written a Maven search'r'replace tool using DecentXML. It's standalone (no dependencies besides DecentXML and Java's rt.jar), it can search pom.xml files with certain elements/texts and on these files, it can print certain parts (search), check that certain parts have certain values (for example, that all parent elements have the right version in them) and it can replace existing values with new ones. The source is roughly 400 lines: http://code.google.com/p/decentxml/source/browse/trunk/src/test/java/de/pdark/decentxml/MavenSNR.java?r=34 What I would like to see is that Maven keeps the StAX API internally to build the POM and to do its job. Only when a plugin needs to *manipulate* XML that a *user* has written *by hand*, it should use DecentXML. Currently, I know only of these plugins which fall into that category: - archetype - war - version All other plugins wouldn't gain enough from a transition to make the effort worthwhile. I actually think that this can be done... but only using the EventReader/EventWriter model. All the XMLEvents returned by the XMLEventReader are just interfaces. You store in a concrete class for each type of event the information you need. In fact I'm writing an implementation just now. The XMLEventWriter then takes the events back... if they are the events that originated from my XMLEventReader, then I just use the exact padding and prefixing, etc that was read in... if they are new events that originated from elsewhere, I examine how previous events were formatted and try to replicate that. I agree that the cursor interface of StAX will not do what we need... but the Event interface can, as I suspected... but needed to confirm, do exactly what we need... and I am in the process of writing an implementation. The only issue that I could see blocking me is how the StAX specification defines the handling of CR, LF and CR/LF pairs. I suspect that I can stash this information when reading provided the XMLEventReader is constructed from the _correct_ stream. -Stephen. Regards, -- Aaron Optimizer Digulla a.k.a. Philmann Dark It's not the universe that's limited, it's our imagination. Follow me and I'll show you something beyond the limits. http://www.pdark.de/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]