On Dec 1, 2004, at 3:52 AM, Ceki G�lc� wrote:
Hello Curt,
You should also look at DOMConfigurator, particularly lines 758 to 768 (CVS HEAD). Here are those lines:
DocumentBuilder docBuilder = dbf.newDocumentBuilder(); docBuilder.setErrorHandler(new SAXErrorHandler()); docBuilder.setEntityResolver(new Log4jEntityResolver());
// we change the system ID to a valid URI so that Crimson won't // complain. Indeed, "log4j.dtd" alone is not a valid URI which // causes Crimson to barf. The Log4jEntityResolver only cares // about the "log4j.dtd" ending. inputSource.setSystemId("dummy://log4j.dtd");
Document doc = docBuilder.parse(inputSource);
That is how we have resolved "log4j.dtd" in the past.
That would cause any external entities (similar to included files in C) in the configuration file to be improperly resolved. Likely no one has ever cared though.
At 07:57 AM 12/1/2004, Curt Arnold wrote:I've been able to take a look at the current CVS version of JoranConfigurator. It does appear than it is unnecessarily discarding the base URLthat would be needed to resolve relative URI's in the document. If doConfigure(URL) and doConfigure(String) called doConfigure(InputSource) instead of calling doConfigure(InputStream), the base URL would be preserved and relative URL's could be resolved.
Really? I would have never thought of that.
Yep. That is really the reason there is an InputSource class to begin with. XML parsing requires both an InputStream and a base URI to resolve external entities like DTD's.
There are a couple of other troubling things about the code.:
Substantial duplication between doConfigure(URL) and doConfigure(String)
Yes, but I think necessary duplication. You need to be able to collect errors specific to doConfigure(URL) or doConfigure(String), hence the duplication. If you can think of something better, I'm all ears.
I think that if you use SaxParser.parse(File) and SaxParser.parse(String url) (as appropriate), you will get even better messages and it will push some of the complexity off into the parser. To avoid having unnecessary duplicate code, you could either have an anonymous inner class that is used to do just the parse step or collect all the surrounding code into a few private methods so the body of doConfigure(URL) is only 7 lines or so.
Substantial duplication in o.a.l.joran.util.checkIfWellFormed(URL) and checkIfWellformed(String)
Again, necessary duplication.
The configuration file is parsed twice, once in checkIfWellFormed and once in doConfigure. Each time reading the configuration from disk (or net or however the URL is resolved).
I assume that the two-pass approach is designed to make configuration atomic, so that any error in the configuration file results in no configuration being applied.
Exactly.
You could copy the InputStream into a byte array and use MemoryInputStream to prevent the file from being access twice, but I think it be better to only parse once.
It would be nicer to parse only once but how to do you guarantee well-formedness with only a single pass?
Double file access is the consequence of double parsing. Eliminating the latter implies eliminating the former.
One approach would be to try to stack configuration actions in a list, but I think that would be pretty complicated. It appears that the Joran configurator ignores any character content (other than within attribute). It should be fairly simple to parse the document and create a list of start and end element events, then replay the events into the Joran configurator.
You could do that but why bother?
I hate doing something twice. There is also the unlikely possibility that the content may change between requests (say if configuration file generated on URL access contained the current system time or some tracking number).
Eliminating the duel fetching of the configuration file would probably come close to paying for the expense of validating the configuration file against a log4j.dtd embedded as a resource.
I don't think you can declare all valid config files with a DTD. For one, Joran can configure sub-components
of log4j components, or sub-components of sub-components all which were unknown at compile time. Joran can also be taught new parsing rules on the fly, rules which were unknown to the dtd. Maybe a very loose DTD would cut it. However, how would a DTD help with the double parse problem? Can you explain?
Doesn't help with the dual fetching problem just thinking what could be done with the saved time., but I've reconsidered. I think that most users would just want us to do the best we can with a well-formed but invalid (per the DTD) configuration file.
JoranConfiguration as far as I can tell wouldn't currently complain about elements containing character content.
??
<log4j:configuration xmlns="...">Four score and seven years ago, our fathers...</log4j:configuration>
Implicit is misspelled in "addImplcitAction", getapplicableActionList on org.apache.joran.Interpreter.
Thanks.
I'd be willing to take a shot if you'd like.
Sure, please do.
There are a couple other things I'd like to fix while I'm at it and which should not add much complexity with the approach that I'm thinking about.
First, log4j's depends on the log4j namespace prefix. From an XML Infoset view, the following documents are identical, but log4j will only correctly interpret the first:
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/"> <root/> </log4j:configuration>
<log4cxx:configuration xmlns:log4cxx="http://jakarta.apache.org/log4j/"> <root/> </log4cxx:configuration>
<configuration xmlns="http://jakarta.apache.org/log4j/"> <root xmlns=""/> </configuration>
An XML editor would be well within its rights to switch between the forms which may result in a InfoSet identical but unusable configuration file.
Second, log4j has the legal but unusual structure of a namespace qualified document element <log4j:configuration> with non-namespaced child elements. I'd like to allow log4j to accept either non-namespaced or log4j namespace qualified child elements. So log4j could take the following two documents which are not Infoset equivalent to the previous documents, but would be interpreted identically.
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/"> <log4j:root/> </log4j:configuration>
<configuration xmlns="http://jakarta.apache.org/log4j/"> <root/> </configuration>
Third, have log4j ignore elements that are namespace qualified, but are not in a log4j recognized namespace. This would allow existing Resource Description Format metadata to appear in configuration files:
<configuration xmlns="http://jakarta.apache.org/log4j/">
<rdf:Description xmlns:rdf="http://www.w3.org/..." xmlns:dc="http://...">
<dc:creator>Ceki Gulku</dc:creator>
<dc:title>The most beautiful logging configuration file in the world</dc:title>
</rdf:Descripton>
<root/>
</configuration>
The approach that I'm thinking of is to parse the configuration once, collect objects that correspond to the startElement and endElement events but with some minor namespace normalization and dropping foreign elements, then replay those normalized events to JoranInterpreter. It should not be necessary to make any changes to JoranInterpreter.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
